Applewhite and the ‘mysterious PDF’

Our friend Hermitian posted a link to a PDF found on the Muscatine Journal Web site

The METADAT from 4db82608b486f.pdf also indicates that the PDF creator tool was Adobe Photoshop CS2 and the producer tool was Adobe Photoshop for Windows — Image Conversion Plug-in. The original file type was TIFF ( 200 PPI X 200 PPI ) and the PDF was created with Photoshop

So now we can attempt to recreate the workflow

Adobe mentions a tool which they call the Creative Suite Image Conversion Utility. Not a plugin but let’s assume for the moment that we can learn from the workflow.

To start the conversion, a server drops art files into a production folder monitored by the AIR application (1). The AIR app performs three actions when it detects new files. First, it opens the art files in Illustrator CS5 (2) and locates the layer named STYLE. All descendants of the STYLE layer are of interest to the customer. The AIR app saves each layer in each file as separate EPS and PSD files (3). After the EPS and PSD files are saved, the Image Converter in Illustrator closes the original files without saving the layers and then notifies the AIR application that the conversion is complete (4). Then the AIR app sends the PSD files to Photoshop CS5 (2) where they are opened and converted to TIFF files (3). After the art files are saved as TIFF files, the Image Converter in Photoshop notifies the AIR app that the conversion is complete (4). Finally, the AIR app moves the completed files to an Output folder (5) and deletes all the temporary files. When the AIR app posts the EPS and TIFF files to the Output folder, the server automatically sends the files downstream for consumption by merchandising systems (6).

The previous scripted workflow that needed to be manually started and monitored is now entirely automated. Employees are no longer required to monitor a server for the arrival of new files and then manually start the file conversion script. A once costly, opaque, and error-prone process is now streamlined and fast.

This does not explain yet, how the PDF file was created but let’s figure out what we can learn about the previous scripted workflow as Adobe CS2 is quite ancient. Wikipedia mentions that it was first released on April 4, 2005. We have since then seen CS3, CS4, CS5 and CS6 followed by CC. So far no luck but the image was transmitted from AP as it shows the same DCSA103 information.

Need to dive deeper.
The AP Document shows (cleaned up, carriage returns removed or added and reordered to align tags)

‘<?xpacket begin=”\xef\xbb\xbf” id=”W5M0MpCehiHzreSzNTczkc9d”?>
<x:xmpmeta xmlns:x=”adobe:ns:meta/” x:xmptk=”Adobe XMP Core 4.0-c321 44.398116, Tue Aug 04 2009 14:24:39″>
<rdf:RDF xmlns:rdf=”http://www.w3.org/1999/02/22-rdf-syntax-ns#”&gt;
<rdf:Description rdf:about=”” xmlns:photoshop=”http://ns.adobe.com/photoshop/1.0/”><photoshop:Source&gt;
AP
</photoshop:Source><photoshop:Country>
USA
</photoshop:Country><photoshop:Credit>
AP
</photoshop:Credit>
<photoshop:City>
Washington
</photoshop:City><photoshop:CaptionWriter>
JSA RCL**DC**
</photoshop:CaptionWriter><photoshop:DateCreated>
2011-04-27
</photoshop:DateCreated><photoshop:TransmissionReference>
DCSA103
</photoshop:TransmissionReference><photoshop:State>
DC
</photoshop:State><photoshop:Urgency>
5
</photoshop:Urgency>
<photoshop:Instructions>
HANDOUT IMAGE PROVIDED BY THE WHITE HOUSE
</photoshop:Instructions><photoshop:Category>
A
</photoshop:Category><photoshop:AuthorsPosition>
STF
</photoshop:AuthorsPosition></rdf:Description>
<rdf:Description rdf:about=”” xmlns:dc=”http://purl.org/dc/elements/1.1/”&gt;
<dc:title>
<rdf:Alt>
<rdf:li xml:lang=”x-default”>
Obama
</rdf:li>
</rdf:Alt>
</dc:title>
<dc:creator>
<rdf:Seq>
<rdf:li>
J. Scott Applewhite
</rdf:li>
</rdf:Seq>
</dc:creator>
<dc:description>
<rdf:Alt>
<rdf:li xml:lang=”x-default”>
This handout image provided by the White House shows a copy of the long form of President Barack Obama\’s birth certificate from Hawaii. (AP Photo/J. Scott Applewhite)</rdf:li>
</rdf:Alt>
</dc:description>
</rdf:Description>
</rdf:RDF></x:xmpmeta>
<?xpacket end=”r”?>
‘</blockquote>
<?xpacket begin=”\xef\xbb\xbf” id=”W5M0MpCehiHzreSzNTczkc9d”?>
<x:xmpmeta xmlns:x=”adobe:ns:meta/” x:xmptk=”3.1.1-111″>

<rdf:RDF xmlns:rdf=”http://www.w3.org/1999/02/22-rdf-syntax-ns#”&gt;
<rdf:Description rdf:about=”” xmlns:photoshop=”http://ns.adobe.com/photoshop/1.0/”><photoshop:Source&gt;
AP
</photoshop:Source><photoshop:Country>
USA
</photoshop:Country><photoshop:Credit>
AP
</photoshop:Credit>
<photoshop:City>
Washington
</photoshop:City><photoshop:CaptionWriter>
JSA RCL**DC**
</photoshop:CaptionWriter><photoshop:DateCreated>
2011-04-27
</photoshop:DateCreated><photoshop:TransmissionReference>
DCSA103
</photoshop:TransmissionReference>
<photoshop:State>
DC
</photoshop:State><photoshop:Urgency>
5
</photoshop:Urgency>
<photoshop:Instructions>
HANDOUT IMAGE PROVIDED BY THE WHITE HOUSE
</photoshop:Instructions><photoshop:Category>
A
</photoshop:Category><photoshop:AuthorsPosition>
STF
</photoshop:AuthorsPosition><photoshop:ColorMode>
3
</photoshop:ColorMode>
<photoshop:ICCProfile>
QCT RGB settings
</photoshop:ICCProfile>
<photoshop:History/></rdf:Description>
<rdf:Description rdf:about=”” xmlns:dc=”http://purl.org/dc/elements/1.1/”&gt;
<dc:format>
application/pdf
</dc:format>
<dc:title>
<rdf:Alt>
<rdf:li
xml:lang=”x-default”>
Obama
</rdf:li>
</rdf:Alt>
</dc:title>
<dc:creator>
<rdf:Seq>
<rdf:li>
J. Scott Applewhite
</rdf:li>
</rdf:Seq>
</dc:creator>
<dc:description>
<rdf:Alt>
<rdf:li xml:lang=”x-default”>
This handout image provided by the White House shows a copy of the long form of President Barack Obama\’s birth certificate from Hawaii. (AP Photo/J. Scott Applewhite)
</rdf:li>
</rdf:Alt>
</dc:description>
<dc:rights>
<rdf:Alt>
<rdf:li xml:lang=”x-default”>
AP2011
</rdf:li>
</rdf:Alt>
</dc:rights>
</rdf:Description>

So let’s look at the DCTDecode object (jpeg) which may help us understand what happened here

obj 11 0
Type: /XObject
Referencing: 9 0 R, 7 0 R
Contains stream<<
/Subtype /Image
/Length 1064112
/Filter /DCTDecode
/Name /X
/Metadata 9 0 R
/BitsPerComponent 8
/ColorSpace 7 0 R
/Width 2698
/Height 3234
/Type /XObject >>
obj 11 0
Type: /XObject
Referencing:
Contains stream<<
/Subtype /Image
/Length 294131
/Filter /DCTDecode
/Name /X
/BitsPerComponent 8
/ColorSpace /DeviceRGB
/Width 1043
/DecodeParms<<
/Columns 1043
/Blend 1
/VSamples [2 1 1 2]
/HSamples [2 1 1 2]
/Rows 1243
/Colors 3
/ColorTransform 1
/QFactor 0.25
>>
/Height 1243
/Type /XObject
>>

Lots of fun stuff. Time do do a deeper dive into PDF

AP document       : 2698x3234  ratio 0.83426097711812
Muscatine document: 1043x1243  ratio 0.8390989541432

But we already knew that the Muscatine document was cropped... 
So let's compare apples and apples...
Compare the width of box 21

AP Document: 1387  @200 PPI   6.935''
Muscatine  :  555  @120 PPI   4.625''

So the Muscatine document was scaled by 2/3 and cropped.

AP        : 13.49 × 16.17 inches  971.28pt 1164.24pt   72.0  dpi   0.36    200 PPI  
Muscatine :  8.7  × 10.36 inches  626.04pt  745.56pt   71.95 PPI   0.6     120 PPI

Ratios:   0.6449 x 0.6407

I repeated the process in GIMP and the two images overlap perfectly

Open 200×200 PPI AP document

Scale image

X-resolution 200->120

Y-resolution 200->120

Scale Pixels by 120/200

Now scale resulting picture by 2/3

The manipulated AP document is now 1079×1294 versus 1043 by 1243 or 36 pixels horizontal and 51 pixels vertical.

Cross check

36 pixels 16 pixels left 20 right
51 pixels 49 on top and 2 on bottom

Check….

I turned on and off the layers and found that other than color differences they show the same pixels… If I can find a way to do a pixel by pixel comparison….

32 thoughts on “Applewhite and the ‘mysterious PDF’

  1. NBC

    “So now we can attempt to recreate the workflow

    “Adobe mentions a tool which they call the Creative Suite Image Conversion Utility. Not a plugin but let’s assume for the moment that we can learn from the workflow.

    Did anyone else notice this latest diversionary ploy posted by NBC.

    And did you also notice that NBC made another one of his classic empty promises ?

    NBC and the entire Obot horde cannot complete a sentence if it doesn’t contain the word “WORKFLOW”.

    Now we know that Adobe has tons of tools for analyzing and creating PDF files.

    We didn’t need NBC to tell us that.

    What do you bet that all of his posted technical mumbo jumbo about the Adobe product was lifted from some Adobe Web Page.

    And the beauty of this latest diversion is that is has nothing to do with the central question —which is where did Applewhite get the pre-release White background paper copy of the Obama LFCOLB before anybody else got it ?

  2. NBC

    Oh Goody! A photo montage DIVERSION !!!

    So NBC totally blew off my important observation that the first (purportedly scanned to PDF image) of Obama’s LCOLB which was published on the internet was created at 09:00:38 by AP photographer Scott J. Applewhite approximately 12-1/2 minutes after the start of the press gaggle at 8:48 AM at the White House.

    I have examined the bottom edge of this copy at 400% zoom. There is no evidence of the bleed-through image of the Obama short-form Certification of Live Birth.

    NOW EVERYBODY SHOULD KNOW THAT THIS WAS NOT APPLEWHITES SO-CALLED“AP” COPY.

    And then strangely at only 28 minutes later at 9:28:48 AM Applewhite created another scanned to PDF image from one of the pale-Blue handout copies. This second image is the so-called “AP” copy.

    This is the widely distributed pal-Blue background image which does have the bleed-through of the Obama short-form COLB.

    However, NBC’s diversionary Obama LFCOLB Photo splash is useful because it proves that, no matter how many images that NBC could grab, none was created as early as the first PDF image produced by Applewhite. Consequently, Applewhite has the distinction of being the first out of the box with a PDF image of the Obama LFCOLB.

    He even beat out the White House to be the first to release a PDF copy. Of course the WH LFCOLB PDF image looks nothing like Applewhite’s 1st creation. The WH LFCOLB has the Green basket-weave safety paper background but Applewhite’s creation has a near-white background.

    And Applewhite’s Early Bird Special is the only one created and produced by means of Photoshop and saved as a Photoshop PDF image file.

    It also happens to be the only GrayScale image of the Obama LFCOLB that has a near- White Background.

    The Applewhite “AP” pal-Blue copy was created at:

    <-<<<< 2011-04-27T09:28:48-04:00>>>->

    The creator tool was:

    <-<<<Adobe Acrobat 8.26>>>->

    And it was produced by:

    <-<<<Adobe Acrobat 8.26 Image Conversion Plug-in>>>->

    So why did Applewhite switch from Photoshop to Acrobat in a span of only 28 minutes?

    It’s not common knowledge but it is possible to scan directly into Photoshop for some versions.

    See: “How To Scan Using Adobe Photoshop”

    http://www.public.asu.edu/~mdelahun/are598/how_to_scan.html

    But then everything else is weird about Obama’s birth certificate.

    So why am I surprised at these little shenanigans?

    Just remember !

    Photoshop PDF and Adobe PDF were not created equal !

    But naturally NBC is not at all interested in the Applewhite Follys (Follies).

  3. P. S.

    The Applewhite Early Bird Special LFCOLB PDF METADATA has an embedded comment that it is a copy of one of the handout copies.

    <-<<<AP>>>->
    <-<<<USA>>>->
    <<<AP>>>->
    <-<<<Washington>>>->
    <-<<<JSA RCL**DC**>>>->
    <-<<<2011-04-27>>>->
    <- <<<DCSA103
    <-<<<DC>>>->
    <-<<<5>>>->
    <-<<<HANDOUT IMAGE PROVIDED BY THE WHITE HOUSE>>>->
    <<<-A>>>->
    <-<<<STF>>>->
    <-<<<3>>>->
    <-<<<QCT RGB settings>>>->
    <-<<< >>>->

  4. Your editor is still eating most of the METADATA text…

    Ah I see what you mean by “my editor”. You mean the comment box provided for by WordPress?

    Yes, I noticed this myself…

    You need to properly escape tags that contain > and < and it chokes on Unicode text as well… Darn annoying.

  5. But naturally NBC is not at all interested in the Applewhite Follys (Follies).

    They are a minor distraction at best.

    So NBC totally blew off my important observation that the first (purportedly scanned to PDF image) of Obama’s LCOLB which was published on the internet was created at 09:00:38 by AP photographer Scott J. Applewhite approximately 12-1/2 minutes after the start of the press gaggle at 8:48 AM at the White House.

    You appear to be still a little off on the timing. I have been working on a posting to clarify some of this

    On Apr 27, 2011, at 8:54:21 AM, Scott Applewhite created a jpeg from a TIIF of hand-out provided by the White House.

    Check out the time stamp in the jpeg embedded in the PDF

    ap_obama_certificate_dm_110427-000.jpeg

    Date Time Original: Apr 27, 2011 8:53:21 AM

    This is from the JPEG embedded in

    ap_obama_certificate_dm_110427.pdf

    The newspaper PDF shows the following metadata

    /AdobePhotoshop
    /LastModified "(D:20110427090136-06'00')"
    /LastModified "(D:20110427090139-06'00')"
    

    Or 10:01:36 DC Time (the -6:00 shows that it was created in a time zone consistent with Iowa, the location of the newspaper in question). So the white version was created from the bluish one.

  6. Let me explain the likely workflow

    Applewhite captured the handout on his camera and extracted in on his computer, converted it to a JPEG and sent the document to the DC office where a PDF was created.

    The Newspaper site obtained Applewhite’s photograph and edited it by cropping.

    These Applewhite jpegs all show the COLB shining through.

  7. The WH LFCOLB has the Green basket-weave safety paper background but Applewhite’s creation has a near-white background.

    Yes, when the document was copied in B&W it appears that most of the basket weave was removed. Of course you can still see some remnants. This document was distributed with a bluish tint. This may likely be caused by the lighting conditions, regardless, the newspaper cleaned it up.

    It’s not that hard to do with the right tools.

    I am not sure why Hermitian continues to make a big deal of the timeline which, in today’s digital age, is quite reasonable. Of course their are variations possible here.

    The RAW formatted photos are directly sent to the DC office where they are processed.

    Regardless, the blue and white document completely overlap, so again, I am not sure why Hermetian has to conclude: forgery.

    I am quite interested in the Applewhite document, which is why I have been working on a detailed timeline. There are some fascinating issues to be resolved.

    Especially a -4:00 date offset in the AP PDF document…

    But so far there appears to be nothing to be pointing at an outright forgery.

  8. Did anyone else notice this latest diversionary ploy posted by NBC.

    I understand that anything that shows your position to be flawed is now diversionary🙂

    And did you also notice that NBC made another one of his classic empty promises ?

    Which one this time?

    NBC and the entire Obot horde cannot complete a sentence if it doesn’t contain the word “WORKFLOW”.

    Because the workflow helps unravel how a document was created.

    Now we know that Adobe has tons of tools for analyzing and creating PDF files.

    Yes, somehow noone reported on the detailed code analysis. Even Hermitian is now quoting from the raw PDF file… Finally people are trying to understand the source of the rotations and scalings…

    We didn’t need NBC to tell us that.

    What do you bet that all of his posted technical mumbo jumbo about the Adobe product was lifted from some Adobe Web Page.

    Imagine that, I actually provide supporting evidence for my claims.

    And the beauty of this latest diversion is that is has nothing to do with the central question —which is where did Applewhite get the pre-release White background paper copy of the Obama LFCOLB before anybody else got it ?

    What pre-release? You said yourself that the press conference started around 8:47 IIRC.

    Applewhite captured a photo and created a jpeg at Date Time Original: Apr 27, 2011 8:53:21 AM

    And sent it to the DC office. This is the bluish document. The white document was created from this bluish document an hour later by a newspaper in Iowa.

  9. A while back I scanned a color printout of the LFBC on my home all-in-one machine in grayscale. The green security background disappeared almost exactly as it did in the AP photo with about every setting I tried. The point is that it is not unusual that it disappeared in a black and white scan or photo. It could be that it disappeared when the White House copied the presser handout in black and white. It appears the handout was a copy of the LFBC and the COLB stapled together. Nice find on that Politico photo NBC. I had never seen that one. It sure explains why the COLB could be seen on the AP JPG.

  10. “There is something weird, the PDF is 5.9 Mb, the jpeg 294 Kb…”

    You totally missed the thumbnail Dude. Doesn’t your MAC OS open Photoshop thumbnails on your Explorer page (or what ever you call Explorer on a MAC OS — probably Kitty or something like that)

    I believe the thumbnail is in the form of XMP <-<<<>>>- and is part of the Adobe XMP METADATA. I tripped on this possibility when I first tried to load the PDF file’s METADATA into your comment editor. Your editor opened the thumbnail image file and wrote the image contents to the comment window. What a mess! Your comment editor is both a Black Hole and a Volcano.

    I know there is a file thumbnail because the image thumbnail opens in Windows explorer and is assigned to Photoshop.

    The 200PPI x 200PPI PDF file is 5.9 Mb.

  11. P.S.

    Your editor ate the following text which was in the arrowheads to the right of the first occurence of “XMP” above.

    [ xmpGImg:image ]

  12. The 200PPI x 200PPI PDF file is 5.9 Mb.

    Makes far more sense. Yes.. The 54Mbyte resulting text file shows two embedded DCTDecode images.

  13. Your comment editor is both a Black Hole and a Volcano.

    Do you need contact information for WordPress Customer Support🙂

  14. NBC

    “Or 10:01:36 DC Time (the -6:00 shows that it was created in a time zone consistent with Iowa, the location of the newspaper in question). So the white version was created from the bluish one.”

    The GMT Offset in the METADATA is -5:00 not -6:00

    The computer clock could have been on EST rather than DST. That’s a common problem on some computers.

    Besides your new scenario is ridiculous.

    Everybody knows the the AP supplies photographs to the newspapers.

    According to your alternate scenario the Muscatine Journal had to have in their possession a paper copy of the LFCOLB on white paper by 10 AM local time on 04/27/2011. Otherwise how could they have scanned the PDF to Photoshop ?

    Also the PDF METADATA is full of credits to AP, Applewhite and the AP caption writer but doesn’t mention anyone with the Muscatine Journal.

    Also the PDF opens in PDF XChange Viewer on the Muscatine Journal web site. That particular viewer is not widely used on web sites. I happen to have PDF XChange Viewer Pro and the new PDF XChange Editor Pro. The Muscatine Journal PDF opens at the low resolution of 120 PPI x 120 PPI in both of these viewers. I know why it does that. It does so in spite of the fact the scanned image was produced at 200 PPI x 200 PPI. The page dimensions for the two different resolutions are the same.

    So why would the Muscatine Journal create a 1046 Kb scan to Pdf file at 200 PPI x 200 PPI and then post it at 120 PPI x 120 PPI ?

    No newspaper that posts photographs daily to be downloaded from PDF XChange Viewer is going to make that mistake.

    I don’t know where you got 5 Mb for the file-size for the Muscatine Journal PDF ???

    I don’t think you want to hang the creation of the only Photoshop scan to PDF image of the Obama LFCOLB PDF on the Muscatine Journal.

    The Muscatine Journal PDF is an important image because it is was produced with Photoshop at a uniform resolution of 200 PPI x 200 PPI. This resolution is between the 150 PPI x 150 PPI for the WH LFCOLB background layer and the 300 PPI x 300 PPI of the non-background layers. It is also the only image with uniform Grayscale on a near-white background. It also has distinct color fringes which indicate that it is a scanned image. Finally the page-size is only slightly greater than letter size in width. The page measures 8.965 in. x 10.355 in. The page size is the same at 120 PPI x 120 PPI as it is for 200 PPI x 200 PPI. These findings are extremely unusual for a PDF file.

  15. The Muscatine Journal PDF is an important image because it is was produced with Photoshop at a uniform resolution of 200 PPI x 200 PPI. This resolution is between the 150 PPI x 150 PPI for the WH LFCOLB background layer and the 300 PPI x 300 PPI of the non-background layers. It is also the only image with uniform Grayscale on a near-white background. It also has distinct color fringes which indicate that it is a scanned image.

    A much simpler explanation can be shown to recreate the same jpeg and I outlined it above.

    THe color fringes are, I believe, caused by adjustments to the color curves to compensate for the white balance (bluish tint) in the original.

    If you believe it to be at a 200x200ppi resolution, just like the AP document then why does it have almost half the pixels? Yes, I do realize that the document is much larger. Perhaps you can extract the original content, then we could do a direct comparison to the AP document.

    I have shown above how you can turn the AP document into a Muscatine document. I will see if I can recreate the color level adjustments.

    A much simpler and elegant explanation than yours, which also explains a lot of the features. Yes, there are contradicting metadata information about the time, one showing 10AM DC time, the other 9AM DC time.
    What I believe happened is that when the document was imported into photoshop it took on the metadata of the original file. Sadly enough I do not have a way to document this.
    The fact that the page shows metadata that the photoshop data was modified 10:00AM DC time, strongly supports my position.

    I will report on my color matching attempts.

    Fun fun fun

  16. NBC

    “On Apr 27, 2011, at 8:54:21 AM, Scott Applewhite created a jpeg from a TIIF of hand-out provided by the White House.”

    This makes no sense at all ! How can a printed paper copy be a TIFF image file? And if you meant to say that the reporter’s handout paper copy was printed from a TIFF image file, then just exactly how do you know that ? You would be the only one on the planet outside of the White House who knows this tidbit.

    So are claiming that your JPEG file was produced from a paper copy or from a TIFF file ?

  17. NBC

    NBC posted the purported METADATA for the Muscatene PDF file.

    “The newspaper PDF shows the following metadata
    /AdobePhotoshop
    /LastModified “(D:20110427090136-06’00’)”
    /LastModified “(D:20110427090139-06’00’)””

    “Or 10:01:36 DC Time (the -6:00 shows that it was created in a time zone consistent with Iowa, the location of the newspaper in question). So the white version was created from the bluish one.”

    So I just downloaded a fresh copy of the PDF from the Muscatine URL here:

    http://muscatinejournal.com/pdf_6a633f26-70d9-11e0-8729-001cc4c002e0.html

    So here is the correct METADATA from the Muscatine PDF.

    /2011-04-27T09:00:38-05:00
    /2011-04-27T09:01:39-05:00
    /2011-04-27T09:01:39-05:00
    / Adobe Photoshop CS2 Windows

    Notice that the GMT offset is -5:00 not -6:00 as NBC claims. The ModifyDate is approximately 1 min after the CreateDate. The METADATADate and the ModifyDate are the same.

  18. P.S.

    The comment box ate all of my METADATA labels. So you can download the PDF, open it in Illustrator and then look at File/File Info.

  19. NBC

    “Regardless, the blue and white document completely overlap, so again, I am not sure why Hermetian has to conclude: forgery.”

    HHHHHHHHHHHHHHHHHHHHHHHHHHHHHH

    Well this post is just to correct your errors. You are comparing apples and oranges. The actual resolution of each image before any scalings is 200 PPI x 200 PPI.

    So garbage-in-garbage-out.

    It’s not anybody’s fault but yours that you don’t know how to properly analyze a Photoshop PDF document.

  20. NBC

    You probably don’t know that the JPEG files that are created by the Xerox 7655 for printing are contained in TIFF wrappers. Consequently it’s alway a TIFF file that is placed into the PDF anytime that a JPEG is involved. This would include print to PDF files.

    And you might want to look at the capabilities of the “scan to PC software” that was supplied with the 7655. The scan to PC software for the 7655 supports OCR but there is no mention of MRC capability.

    The file formats supported for scanned documents include:

    TIFF 6.0 with G3 MH, G4 MMR, or JPEG compression (single page or multiple pages)

    PDF (image only and searchable PDFs with G3 MH, G4 MMR, MRC, JPEG, or JBIG2 compression).

    JIFF with JPEG compression

    Microsoft XML Paper Spec (XPS).

    Compression formats available only for the scan to Export option
    Export compression formats:
    MRC (mulitilayer)
    JBIG2

    Looks like mixing of different compression algorithms such as MRC and JBIG2 is not provided for.

    Also only JFIF format with JPEG compression is provided.

    Scan to JPEG is not a supported scanning file format. In other words JPEG compression can be selected but JPEG file format cannot be. JFIF format is provided.

    The first question that NBC must address is whether his JPEG extraction tool has been tested to be fully compatible with the JFIF standard ?

    Apparently he is trying to extract a JPEG file from within the Xerox PDF file when no JPEG file exists within the PDF file.

    Of course the 7655 is no longer sold as new by Xerox. The shopper is diverted to the 7800 series Xerox MFPs. This 7655 only supports WIN 2000, WIN XP, and WIN Server 2003 and WIN Vista. It does not support either WIN 7 or WIN 8. However it does support Mac OS 8.x, 9.x, OS X, 10.3 and higher.

    The standard model of the 7655 is a copier only, print, scan and FAX are optional. In scan mode a 600 dpi x 600 dpi scanned image can be software converted to output at resolutions up to 2400dpi x 2400 dpi.

    However,unlike the 7655, the Xerox 7535 MFP is still sold by Xerox.

    Somebody needs to do NBC’s homework for him !

  21. Apparently he is trying to extract a JPEG file from within the Xerox PDF file when no JPEG file exists within the PDF file.

    That is funny since the PDF clearly shows the JPEG file… Fascinating… So the Xerox PDF looks exactly like the WH PDF and yet Hermitian finds excuses as to ignore these facts.

    MRC compression is mentioned in the manuals IIRC. And yes, they obviously support MRC compression, just look at the WH Tax form PDF.

    Sigh…

  22. As to the -6:00

    Check out the pdf at line 30

    8 0 obj<</Metadata 5 0 R/Pages 4 0 R/Type/Catalog>>
    endobj
    9 0 obj<</CropBox[0.0 0.0 626.04 745.56]/Parent 4 0 R/Contents 10 0 R/Rotate 0/PieceInfo<</AdobePhotoshop<</Private 12 0 R/LastModified(D:20110427090136-06’00’)>>>>/ArtBox[0.0 0.0 626.04 745.56]/MediaBox[0.0 0.0 626.04 745.56]/Thumb 3 0 R/Resources<</XObject<</Im0 11 0 R>>/ProcSet[/PDF/ImageC]>>/Type/Page/LastModified(D:20110427090139-06’00’)>>

    You research is getting sloppier and sloppier by the second… Come on Hermitian, this is not that hard to get right.

  23. The comment box ate all of my METADATA labels. So you can download the PDF, open it in Illustrator and then look at File/File Info.

    Missing out on all the PDF data again by relying on high level tools….
    Sigh… Did you even read my postings outlining what I found? Or do you just ignore the facts?

  24. NBC

    “If you believe it to be at a 200x200ppi resolution, just like the AP document then why does it have almost half the pixels? Yes, I do realize that the document is much larger. Perhaps you can extract the original content, then we could do a direct comparison to the AP document.”

    I don’t just believe that it is a 200 PPI x 200 PPI image — I know that it is. The Photoshop PDF METADATA indicate that the Muscadine image was created as a TIFF at 200 PPI X 200 PPI and either scanned directly to Photoshop CS2 or scanned to a TIFF Bitmap file and then opened in Photoshop CS2. The file was then saved as a Photoshop PDF file.

    The TIFF image data from the image info panel in Photoshop CC is:
    W = 8.695 in. ; H = 10.355 in.

    The TIFF image was created at:
    200 PPI x 200 PPI per the PDF METADATA. This resolution was verified by setting the Photoshop CC screen grid to 200 PPI x 200 PPI and examining the image at 6400% zoom. All image pixels are congruent to the grid lines.

    The TIFF IMAGE dimensions in pixels as read from the PDF METADTA are:
    PixelXDimension = 1739 P
    PixelYDimension = 2071 P

    Thus:
    1739 P / 200 PPI = 8.695 in.
    and:
    2071 P/ 200 PPI = 10.355 in.

    So maybe you could explain how the Muscatine PDF image was created by reducing the page-size of the AP PDF down to the size of the Muscatine PDF without increasing the pixel resolution of the scaled AP PDF above 200 PPI x 200 PPI?

  25. NBC – “MRC compression is mentioned in the manuals IIRC. And yes, they obviously support MRC compression, just look at the WH Tax form PDF.”

    From the 7655 Administrator Manual

    PDF & PDF/A Settings
    Select Optimized for Fast Web Viewing if you want to create linearized PDF files. Linearized PDF files allow the first page of the PDF file to be displayed in a user’s Web browser, before the entire file is downloaded from the Web server. This fast first page display helps to alleviate Internet user frustration in waiting for an entire file to download before displaying the file’s contents.

    Select MRC Compression if you want to use Mixed Raster Content (MRC) compression. MRC is used to divide the scanned image based on content, and then compress each area in the optimal manner for that image area. This option allows for smaller output files with better image quality.

    So it would be set on the Xerox by the Administrator.

    http://download.support.xerox.com/pub/docs/WC7655_WC7665/userdocs/any-os/en/WC-7655-7665-7675_SAG-final.pdf

  26. I am glad that some people can do the trivial research here. Hermitian appears to have abandoned patience in favor of quick judgments. I will have to write another posting tonite to correct his many misunderstandings.

    Well, at least it helps me strengthen the case for the Xerox workflow, which has become quite unassailable.

    But perhaps Hermitian will be able to print to PDF the xerox file using preview on OS/X and find what I found…

    Or perhaps I will take pity on him for being unable to do these rather trivial tasks and share the PDF data. But of course, how would he know that I did not change the data?…

    A diligent research I believe would not rely on such second hand data.

  27. This makes no sense at all ! How can a printed paper copy be a TIFF image file? And if you meant to say that the reporter’s handout paper copy was printed from a TIFF image file, then just exactly how do you know that ? You would be the only one on the planet outside of the White House who knows this tidbit.

    So are claiming that your JPEG file was produced from a paper copy or from a TIFF file ?

    Think before you react. Scott Applewhite took a picture of the document and converted the raw format to TIFF or directly to JPEG. Come on Hermitian, you are getting a bit silly here.

    We already know that the documents were handed out before Applewhite took the photograph and sent it to the DC office.

  28. I don’t just believe that it is a 200 PPI x 200 PPI image — I know that it is. The Photoshop PDF METADATA indicate that the Muscadine image was created as a TIFF at 200 PPI X 200 PPI and either scanned directly to Photoshop CS2 or scanned to a TIFF Bitmap file and then opened in Photoshop CS2. The file was then saved as a Photoshop PDF file.

    And you never have heard of downsampling to a JPEG… OMG… It was imported into CS2 by the image conversion plugin which likely took the original JPEG, transformed it into a TIFF 200×200 and then it was saved as a JPEG of much lower resolution.

    Just look at the images side by side… The Muscatine PDF is of lower resolution quite clearly. You are relying too much on your tools my friend…

    Sigh… The JPEG inside the AP document was indeed 200 PPI and thus importing it into a 200 PPI TIFF makes some sense but then it was downsampled and in fact the quality was further reduced.

  29. The first question that NBC must address is whether his JPEG extraction tool has been tested to be fully compatible with the JFIF standard ?

    Apparently he is trying to extract a JPEG file from within the Xerox PDF file when no JPEG file exists within the PDF file.

    I extracted the binary datastream manually using an editor to check that the tools did not introduce the comments.

    I am quite familiar with due diligence and triple checking my work.

    It’s a good way to do research. Perhaps you would like to hear more?

  30. And Hermitian, the scan to PC software is irrelevant. In most deployments this is a server based printer/scanner and the email workflow is used.

    Come on my friend, tell me that you are at least somewhat familiar with such devices? Not all scanners are hooked directly to one’s PC.. Those days are long since gone.

  31. Well this post is just to correct your errors. You are comparing apples and oranges. The actual resolution of each image before any scalings is 200 PPI x 200 PPI.

    That is incorrect as anyone with an editor can ascertain for themselves. But let’s look at a side by side. I always like to educate someone as to how to do a proper comparison

Comments are closed.