Extracting the WH 7535 PDF JPEG

Posted on August 7, 2013 by NBC

Our friend Hermitian is having troubles finding the ‘YCbCr’ comment tag in the jpeg embedded in the wh-lfbc-scanned-xerox-7535-wc.pdf file. I previously outlined how to do this:

qpdf –show-object=12 –raw-stream-data wh/wh-lfbc-scanned-xerox-7535-wc.pdf >stream12.png

Note that the png extension is used to ‘fool’ wordpress into accepting the document. You can download and save it.

Remember Obj 12?

obj 12 0
 Type: /XObject
 Referencing: 32 0 R
 Contains stream

  <<
    /BitsPerComponent 8
    /ColorSpace /DeviceRGB
    /DecodeParms [null 32 0 R]
    /Filter [/FlateDecode/DCTDecode]
    /Height 1280
    /Length 231258
    /Subtype /Image
    /Type /XObject
    /Width 1664
  >>

This will extract Obj 12 as a raw data stream. If you cannot get qpdf to work, you can use a hex editor to extract Obj 12 stream from the PDF directly.

Deflate the stream using

python ./deflate.py stream12.png stream12.jpg

I will even give you the deflate.py script

import zlib
import sys

args = sys.argv

if len(args) != 3:
	print("Usage python.exe "+args[0]+" ")
	exit(0)

input = args[1]
output = args[2]

file_read = open(input,'rb')
buffer = file_read.read()
decomp = zlib.decompress(buffer)
file_write = open(output,'w')
file_write.write(decomp)

stream12.jpg

jpeginfo -C stream12.jpg

stream12.jpg 1664 x 1280 24bit n/a N 235646 “YCbCr”

82 thoughts on “Extracting the WH 7535 PDF JPEG”

W. Kevin Vicklund says:

August 7, 2013 at 19:29

Yep, found the comment. And Hermie fails again.
NBC says:

August 7, 2013 at 19:35

Yep, found the comment. And Hermie fails again.

I do not understand Hermitian. Does he really think that I would somehow post something I would not have double, triple checked?

Let me outline what I have done so far:

1. Extract JPEGs from Xerox WorkCentre and non Xerox PDF’s using the following methods
a. pdfimages
b. Hex editor
2. Use Hex editor and jpeginfo tools to search for embedded comments
3. Investigate the tools used to see if they added the jpeg comments

Check check and double check and always doubt your approaches.

Why Hermitian is looking for a JPEG comment in a PDF not created on a Xerox WorkCentre is beyond me but he has been helpful in eliminating yet another candidate 🙂

His doubts and attempts at research do help me reiterate my findings and when they survive, they become yet stronger.

Such is the nature of the scientific method and without Hermitian and people like Vicklund and others, it would not be possible to have created such a solid hypothesis.
Hermitian says:

August 8, 2013 at 11:13

NBC

“Our friend Hermitian is having troubles finding the ‘YCbCr’ comment tag in the jpeg embedded in the wh-lfbc-scanned-xerox-7535-wc.pdf file. I previously outlined how to do this:

There’s nothing wrong with any of the bitmaps that I have extracted by three different methods. Your “smoking gun” YCbCr just doesn’t happen to be in any of them.

NBC: Hence you must have done something wrong because anyone else can find them following the simple steps I outlined. I bet that you let your high level tools alter the contents… A rookie mistake.

And that’s because your label YCbCr has absolutely nothing to do with the bitmap image directly. However, it does have to do with a conflict between the actual color space that was specified for the background layer being different from the default color space specified for the JIFF standard.

NBC: Nonsense. There is nothing that supports this interpretation

The following line of code from the archive copy of the PDF file “birth-certificate-long-form.pdf” is the only line of code within the WH LFCOLB which contains the YCbCr label.

$4á%ñ &'()*56789:CDEFGHIJSTUVWXYZcdefghijstuvwxyz‚ƒ„…†‡ˆ‰Š’“”•–—˜™š¢£¤¥|§¨©ª²³´µ¶·¸¹ºÂÃÄÅÆÇÈÉÊÒÓÔÕÖ×ØÙÚâãäåæçèéêòóôõö÷øùúÿþ

This line of code is identical whether viewed in Text mode, Hex mode or Binary mode.

This line of code resulted from a failed attempt by the forger to create a thumbnail image from an image being uploaded using the PHP language and the GD libraries.

NBC: No evidence of this either

The line of code has absolutely nothing to do with the of the bitmap file which was being uploaded at the time.
Rather it has to do with a conflict between the actual color space that was specified for the background layer and the YcbCr color space.

This single line of code is non-functional and has no effect on any object within the PDF.

NBC: Exactly, which is why it is such an important indicator

Here’s my first guess as to the problem. Most likely the underlying error was that the forger specified the YcbCr color space for the thumbnail whereas he had used the RGB ICCBased color space for the uploaded JPEG. This color space was assigned over the default color space (which is YCbCr) for the JFIF format. Hence there is a conflict between the color space that the forger had assigned to the background layer and the color space that was identified for the thumbnail. The result was that the thumbnail could not be created.

See: http://php.about.com/od/advancedphp/ss/php_thumbnail.htm

There is nothing about these circumstances that is unique to the Xerox Workcenter nor is the YcbCr color space unique to the Xerox Workcenter.

For the error see:

http://mattoonkawyamktm.com/image.php?source=

and here:

http://answers.microsoft.com/en-us/windows/forum/windows_7-networking/trying-to-open-document/b350e011-32a9-4ad8-8a80-90361ebd246a

and here:

http://www.szejnfeld.pl/show_image_NpAdvHover.php?filename=/2012/12/2_14971_01.jpg&cat=7&pid=14173&cache=false

For other examples search:

info.com/$4á%ñ&'()*56789:CDEFGHIJSTUVWXYZcdefghijstuvwxyz‚ƒ„

or:

info.com/'()*56789:CDEFGHIJSTUVWXYZcdefghijstuvwxyz‚ƒ„

This last string also takes one to web sites which display picture galleries or glossy magazines with turning pages.

See: http://www.pingodoce.pt/media/flipBook/AlimentacaoVerao_2a15Agosto/mobile/index.html

The archived copy of the WH LFCOLB can be downloaded from here:

Click to access birth-certificate-long-form.pdf

I will be working to prove my theory as to the cause of this single line of code in the WH LFCOLB.

NBC: Good luck, you certainly have a lot of work to do… And you have to explain why we find the same code in a Xerox generated scan 🙂

But either way the source has nothing to do with a Xerox Work Center.
Hermitian says:

August 8, 2013 at 11:24

For the reader who is more receptive to new ideas if they invole a little levity.

See: http://jokesareawesome.com/tag/strings
Hermitian says:

August 8, 2013 at 12:16

The underlying structure for the single line of code containing YCbCr is:

struct PDFObj sPDFObj[14] = 70 obj
//<>
stream
xxxxx
Byte Data [299519]
Byte Data [—] = xxx

——————————-
obj {11 0 R} is under color space cs1

obj {11 0 R} contains
/ICCBased
Reference to object {29 0 R}

obj {29 0 R} includes
Alternate = DeviceRGB
Length = 2905
N = 1
Filter = /FlateDecode

YCbCr is nowhere to be seen.
W. Kevin Vicklund says:

August 8, 2013 at 13:38

You’re looking at the wrong objects, Hermie. Take the stream from object 12 and deflate it. Not object 14, not object 11, not object 29. Object 12. The only object using DCTDecode.
W. Kevin Vicklund says:

August 8, 2013 at 13:47

Two hands is not enough to describe your fail.
W. Kevin Vicklund says:

August 8, 2013 at 14:20

Wow, that fail is so epic that I’m not even sure what file Hermie is looking at. Maybe the Muscatine Journal pdf? Doesn’t seem to be the WH LFBC pdf, the ABC pdf, or the 7535 pdf.
NBC says:

August 8, 2013 at 15:21

Wow, that fail is so epic that I’m not even sure what file Hermie is looking at. Maybe the Muscatine Journal pdf? Doesn’t seem to be the WH LFBC pdf, the ABC pdf, or the 7535 pdf.

I can only help him so far, he does need to do some work himself.
Hermitian says:

August 8, 2013 at 15:51

W. Kevin Vicklund says:

August 8, 2013 at 14:20

Wow, that fail is so epic that I’m not even sure what file Hermie is looking at. Maybe the Muscatine Journal pdf? Doesn’t seem to be the WH LFBC pdf, the ABC pdf, or the 7535 pdf.

As expected WKV and NBC remain clueless.
1. The label YCbCr is not unique to your Xerox Forger
2. You Xerox Forger did not place the YCbCr label in the WH LFCOLB PDF
3. You now need to show how/why your Preview attempted to create a thumbnail when the bitmap was uploaded.
4. The single line of code containing the YCbCr label in the WH LFCOLB is evidence of tampering by a human.
W. Kevin Vicklund says:

August 8, 2013 at 16:11

As expected WKV and NBC remain clueless.

Then maybe you should tell us what you actually tried to do. Obviously, you didn’t follow the instructions given on where to find the YCbCr comment. So what is it that you are doing, exactly?

1. The label YCbCr is not unique to your Xerox Forger

And yet the only JPEG files we have ever found with that comment were generated by Xerox WorkCentres. It is certainly possible that it is not unique, but it is certainly rare enough to be strong evidence.

2. You Xerox Forger did not place the YCbCr label in the WH LFCOLB PDF

No, it was placed in the JPEG embedded in the PDF. It was not placed in the PDF itself. Looking at the PDF without extracting the JPEG is incorrect.

3. You now need to show how/why your Preview attempted to create a thumbnail when the bitmap was uploaded.

What are you blathering about? There’s no evidence that Preview attempted to create a thumbnail. Are you talking about the Muscatine Journal pdf again?

4. The single line of code containing the YCbCr label in the WH LFCOLB is evidence of tampering by a human.

And yet every Xerox WorkCentre pdf we’ve examined that has MRC with multiple monochrome layers also has an embedded background JPEG with the YCbCr comment. For example, the file that is the topic of the thread that you’ve once again failed to properly examine.
W. Kevin Vicklund says:

August 8, 2013 at 16:20

Hermie, why are you so resistant to extracting the JPEG in the 7355 pdf and examining it for the comment ‘YCbCr’? The procedure has been explained to you many times. You could even get your own DeFlating tool if you distrust the one used by NBC. Are you afraid that you’ll be proven wrong?
Hermitian says:

August 8, 2013 at 16:43

NBC

“”2. You Xerox Forger did not place the YCbCr label in the WH LFCOLB PDF””

“No, it was placed in the JPEG embedded in the PDF. It was not placed in the PDF itself. Looking at the PDF without extracting the JPEG is incorrect.”

So now we are playing by your rules ?

Any method of extraction which yields a viable bitmap image file is acceptable. You don’t get to reject other valid results just because they contradict your storyline.

I’ve been trying to tell you and Vicklund for weeks that the YCbCr label is in the archive copy:

Click to access birth-certificate-long-form.pdf

but does not appear in your Xerox 7535 PDF “wh-lfbc-scanned-xerox-7535-wc.pdf”.

Here is the code that blows your Xerox forger to pieces:

%&'()*456789:CDEFGHIJSTUVWXYZcdefghijstuvwxyzƒ„…†‡ˆ‰Š’“”•–—˜™š¢£¤¥¦§¨©ª²³´µ¶·¸¹ºÂÃÄÅÆÇÈÉÊÒÓÔÕÖ×ØÙÚáâãäåæçèéêñòóôõö÷øùúÿÄ
$4á%ñ&'()*56789:CDEFGHIJSTUVWXYZcdefghijstuvwxyz‚ƒ„…†‡ˆ‰Š’“”•–—˜™š¢£¤¥¦§¨©ª²³´µ¶·¸¹ºÂÃÄÅÆÇÈÉÊÒÓÔÕÖ×ØÙÚâãäåæçèéêòóôõö÷øùúÿþ

This code occupies lines 614-616 of the archived WH LFCOLB. You can select these lines in either the Text mode, HEX mode or Binary mode and the same above lines of code results in all three modes.

Take a long look at the proof. This is a very common problem that many admins and coders have experienced. Their pleas for help are all over the internet.

So the bottom line is that I am 99% sure that your YCbCr label was caused by a failed attempt to create a thumbnail from your precious JPEG when it was uploaded. This action utilized PHP code and the PHP GD Libraries. I believe the attempt failed because the forger specified the wrong color space for the thumbnail.
W. Kevin Vicklund says:

August 8, 2013 at 17:24

Any method of extraction which yields a viable bitmap image file is acceptable. You don’t get to reject other valid results just because they contradict your storyline.

Wrong. The method of extraction must not only produce a valid JPEG, it must be a lossless decompression that doesn’t alter the output. If it adds pixels, it is not acceptable. If it changes the comments, it is not acceptable. It must reproduce the original JPEG exactly. Of course you haven’t shown any evidence that you’ve actually done any extractions whatsoever.

This is a very common problem that many admins and coders have experienced. Their pleas for help are all over the internet.

Indeed it is a very common problem. I did a search on the first part of the string:

&’()*456789:CDEFGHIJSTUVWXYZcdefghijstuvwxyzƒ

And got a bunch of hits. You know what all the advice said?

That it’s a binary stream and you have to decompress it.

So why are you refusing to decompress it?
W. Kevin Vicklund says:

August 8, 2013 at 17:39

It should also be noted that this is not a colorspace that we are looking for in the extracted JPEG file, but rather a comment right at the very beginning.
W. Kevin Vicklund says:

August 8, 2013 at 17:51

(As a sidenote, I am aware that DCT is a lossy compression. When I’m talking about lossless decompression, I’m referring to a process where if you recompress after decompressing, you get the same file as before the decompression. In Calculus, this is like taking an doing a derivation (compression) and then an integration (decompression) of a polynomial – you know you’ve lost any constants, but you can continue to derive and integrate without losing any additional information)
NBC says:

August 8, 2013 at 18:09

but does not appear in your Xerox 7535 PDF “wh-lfbc-scanned-xerox-7535-wc.pdf”.

So you did not follow my steps… It cannot be seen in the PDF because the object is FlateDecode/DCTDecode

Come on Hermitian, stop embarrassing yourself.
W. Kevin Vicklund says:

August 8, 2013 at 18:30

Hold on, I was letting myself get distracted by Hermie’s refusal to extract the JPEG from the 7535 file. To get the 7535 comment, you have to DeFlate the stream for Object 12. This will give you a binary stream in DCT, or JPG fromat. In the 7655 file, the stream is already present as DCT.

Once you have a stream in DCT format, you can search for the YCbCr comment – I had forgotten that I had saved the stream from the WH LFBC as a separate file, so you can indeed find it in the WH LFBC without any additional work. That error is mine (though it should be noted that it is only found in the binary stream of the embedded jpeg). To verify that it is a comment, however, it must be preceded by ÿþ

I have yet to see any indication that Hermie has even attempted to extract the JPEG stream from the 7535 file.
NBC says:

August 8, 2013 at 20:12

I have yet to see any indication that Hermie has even attempted to extract the JPEG stream from the 7535 file.

Well, that’s too hard perhaps and instead he focuses on irrelevant approaches.

Funny stuff.
RoadScholar says:

August 8, 2013 at 20:34

Hot damn you guys are smart. I swerved away from hard tech several decades ago. I’m SO glad to see this sort of rigorous thinking; it dispels my gloom about watching patience, rationality and dedication vanishing all around me.
NBC says:

August 8, 2013 at 21:38

Hot damn you guys are smart.

Perhaps it’s not that we are smart but that… Well you understand where I am going here 🙂
Hermitian says:

August 8, 2013 at 22:36

Unbridled arrogance always precedes a big fail !
NBC says:

August 8, 2013 at 22:59

Unbridled arrogance always precedes a big fail !

That’s so ironic my friend. But why is it that others have no problem extracting the JPEG and the data?…

I provided you with all the necessary steps and still you cannot do it properly.

Sigh…
W. Kevin Vicklund says:

August 8, 2013 at 23:01

Unbridled arrogance always precedes a big fail !

Indeed, and I can think of no better exemplar of this principle than Henry Blake.
NBC says:

August 8, 2013 at 23:03

It’s funny how Hermitian has shown himself to be totally confused… He looks for the YCbCr comment tag in documents that should not have it, and cannot find them in documents that have the object double encoded. All you need to do is extract the object using a hex editor and run it through a deflate.py or zlib tool.

It ain’t rocket science but somehow…
Hermitian says:

August 9, 2013 at 01:12

NBC

I ran your claim of exclusivity of the YCbCr label to the Xerox Workcenter by an expert who knows better. After much ROTFL he told me to stop wasting my time with you two losers.

I will be running your ridiculous claim by two more experts during the next several days. I can hardly wait.
NBC says:

August 9, 2013 at 01:18

I will be running your ridiculous claim by two more experts during the next several days. I can hardly wait.

And yet. the tag is there, and you cannot even find it 🙂

Have you found any non-Xerox documents that contain this tag?

Of course not.

You really have a strange opinion about ‘experts’ but I guess it’s all relative
NBC says:

August 9, 2013 at 01:21

I will be running your ridiculous claim by two more experts during the next several days. I can hardly wait.

Hermitian is setting himself up for fail number xxx 🙂

What he fails to understand is that the YCbCr comment found in the WH LFBC is matched with the same comment in the Xerox document. Even if it were not exclusive to Xerox, it is strong circumstantial evidence and fully in support of my workflow.

You so clueless my friend.
W. Kevin Vicklund says:

August 9, 2013 at 01:43

Ask him to locate a JPEG with that comment in it, one not created by a Xerox WorkCentre (and not created by hand to cheat). If he can’t do that, his opinion is worthless. Of course, I happen to believe you are lying about his existence, but here’s your chance to prove NBC wrong. Frankly, I wouldn’t be surprised to find that there exists -somewhere- a file with that characteristic.

We’ve located dozens with the comment, all from Xerox WorkCentres. We’ve also tested thousands not created by Xerox Workcentres, and not even one had the comment. Even if it’s not unique, it’s rare enough to be a strong indicator.

That’s how evidence works. Each individual piece narrows the possibilities, until you’re left (ideally) with only one possibility. Would I conclude, based solely on the presence of a JPEG with a YCbCr comment, that a pdf was initially created by a Xerox WorkCentre? No. I would want more evidence, just in case it isn’t unique.

By the way, what did your alleged expert say when you told him you didn’t think you needed to DeFlate a JPEG compressed with flate? Maybe that’s why he was laughing so hard.
NBC says:

August 9, 2013 at 01:46

By the way, what did your alleged expert say when you told him you didn’t think you needed to DeFlate a JPEG compressed with flate? Maybe that’s why he was laughing so hard.

Just for fun I have posted the preview version of the WH7535 document… Anyone should be able to find the comment in that document, even Hermitian I am sure…

Then again…

Poor Hermitian has come to realize that his ‘forgery’ claims stand no chance against the facts I have presented.

Not that it matters, but it is great fun to do the sleuthing and apply logic and reason to find the little hints that led me to the Xerox WorkCentre as the culprit…

What is even more hilarious is how people had made many predictions based on common sense and many of them came true:

1. Mixed Raster Compression
2. JBIG2
3. Xerox WorkCentre
4. Different fore and background resolution
Hermitian says:

August 9, 2013 at 12:06

WKV

“Once you have a stream in DCT format, you can search for the YCbCr comment – I had forgotten that I had saved the stream from the WH LFBC as a separate file, so you can indeed find it in the WH LFBC without any additional work. That error is mine (though it should be noted that it is only found in the binary stream of the embedded jpeg). To verify that it is a comment, however, it must be preceded by ÿþ”

“I have yet to see any indication that Hermie has even attempted to extract the JPEG stream from the 7535 file.”

HHHHHHHHHHHHHHHHHHHHHHHHHHHHHH
Why would I want to extract the JPEG from the 7535 file when I have already done that four different ways and got the same results ? And I will match my methods with yours anytime. And besides, as I have already stated, I don’t waste my time on doomed projects.

Consequently, I prefer to spend what little time that I have analyzing the archive copy of the WH LFCOLB which can be downloaded from here:

Click to access birth-certificate-long-form.pdf

You really should take a look at this file from time to time because you can learn much more from the real McCoy. And, besides, I don’t need to extract any JPEGS to find your stinking YCbCr label because it’s right there on code line 616.

” To verify that it is a comment, however, it must be preceded by ÿþ””

That’s funny because I thought that “ÿþ” was the BOM for UTF-16 (little endian). Why would your comment be coded in UTC-16 ?

Is there a Chinese version of the WH LFCOLB that I don’t know about ?
W. Kevin Vicklund says:

August 9, 2013 at 13:56

Why would I want to extract the JPEG from the 7535 file when I have already done that four different ways and got the same results ? And I will match my methods with yours anytime. And besides, as I have already stated, I don’t waste my time on doomed projects.

So you claim, but you have never said what those methods were. Did they actually produce a JPEG-formatted file (such as JFIF)? If they didn’t produce a JPEG, then why would you expect to find a JPEG-formatted comment? Did they even produce a viable image at all? If they do produce viable JPEGS, do you know whether they preserve comments found in the original, or do they strip them out?

That’s funny because I thought that “ÿþ” was the BOM for UTF-16 (little endian). Why would your comment be coded in UTC-16 ?

In the JPEG format, it is a comment. See this wikipedia article, which has been pointed out to you many times: http://en.wikipedia.org/wiki/JPEG#Syntax_and_structure

0xFF=ÿ
0xFE=þ
W. Kevin Vicklund says:

August 9, 2013 at 14:47

That’s funny because I thought that “ÿþ” was the BOM for UTF-16 (little endian).

Forgot to add, this is only used to indicate BOM at the very beginning of a file.
Hermitian says:

August 9, 2013 at 16:51

Has anybody established the order of FlateDecode/DCTDecode for the Xerox Workcenter files ?

Is it
FlateDecode then DCT/Decode
or
DCT/Decode then FlateDecode ?

If you have determined the order, how did you verify your findings ?

Then has anybody determined what if any decompression/re-compression or just re-compression that Preview applies to the Xerox generated PDFs ?
Hermitian says:

August 9, 2013 at 18:00

Interesting details from WKV’s Wikipedia JPEG link :

See: http://en.wikipedia.org/wiki/JPEG#Syntax_and_structure

“Lossless editing[edit]

See also: jpegtran and Commons:User:Cropbot

A number of alterations to a JPEG image can be performed losslessly (that is, without recompression and the associated quality loss) as long as the image size is a multiple of 1 MCU block (Minimum Coded Unit) (usually 16 pixels in both directions, for 4:2:0 chroma subsampling). Utilities that implement this include jpegtran, with user interface Jpegcrop, and the JPG_TRANSFORM plugin to IrfanView.

Blocks can be rotated in 90 degree increments, flipped in the horizontal, vertical and diagonal axes and moved about in the image. Not all blocks from the original image need to be used in the modified one.

The top and left edge of a JPEG image must lie on a 8 × 8 pixel block boundary, but the bottom and right edge need not do so. This limits the possible lossless crop operations, and also prevents flips and rotations of an image whose bottom or right edge does not lie on a block boundary for all channels (because the edge would end up on top or left, where – as aforementioned – a block boundary is obligatory).

When using lossless cropping, if the bottom or right side of the crop region is not on a block boundary then the rest of the data from the partially used blocks will still be present in the cropped file and can be recovered.

It is also possible to transform between baseline and progressive formats without any loss of quality, since the only difference is the order in which the coefficients are placed in the file.

Furthermore, several JPEG images can be losslessly joined together, as long as the edges coincide with block boundaries. jpeg supports 12-bit and 32-bit color as RGB.”

Three details here are very significant.

First an MCU (Minimum Coded Unit). Block is 16 px X 16 px. Lossless editing can be performed on the JPEG image if and only if the image size (in MCU Blocks) is a multiple of 1 (in both width and height).

Second the top and left edges of a JPEG image must lie on a 8 x 8 pixel block boundary, but the bottom and right edges need not. This is evidently required to permit JPEG compression of the image file.

Each of the nine image layers of the WH LFCOLB are rotated 90 degrees counterclockwise for their final orientations before the image is opened in Adobe Illustrator CS6 or CC. A rotation of 90 degrees clockwise is automatically applied to each image layer when the file “birth-certificate-long-form.pdf” is opened with either version of Illustrator. Consequently if the image boundaries touch on 8 x 8 blocks on the top and right sides of each rectangular object boundary, then the left and top edges will touch on 8 x 8 blocks before the WH LFCOLB PDF is opened in Illustrator. This is the orientation of each image layer when the JPEG compression was applied. See “Analysis of Rectangular Object Boundaries”:

http://www.scribd.com/doc/151738307/Analysis-of-Rectangular-Object-Boundaries

For this conclusion to be valid, all rotations must be applied between two x, y coordinate systems rather than between one fixed coordinate and an image object. Fortunately, this interpretation of the meaning of rotations is specified in PDFReferenceXX.pdf. In this interpretation, each object is rigidly attached to an x,y coordinate system and moves with the system. Hence the WH LFCOLB image is in exact compliance with both the JPEG and PDF standards. To the contrary, the Xerox Workcenter generated image layers do not satisfy all of the conditions of either specification.

Third, the JPEG standard is not a file format standard but rather a file compression standard. The JPEG file format standard is the JFIF standard. The facts are in agreement with the Xerox Workcenter 7655 specifications which require the JFIF file format and JPEG compression.

A subtle fact attaches to this combination of requirements. A FlateDecoded file is not a JPEG compressed file. Consequently, such a file has to be a JFIF formated file. Then strict adherence to all of the standards would require that the JPEG label would be incorrect for a FlateDecoded JIFF file.
RoadScholar says:

August 9, 2013 at 18:07

“I will be running your ridiculous claim by two more experts during the next several days. I can hardly wait.”

Any… day… now…
NBC says:

August 9, 2013 at 18:14

Why would I want to extract the JPEG from the 7535 file when I have already done that four different ways and got the same results ?

Because none of your methods likely maintain the integrity of the jpeg? That’s the problem with high level tools

That’s funny because I thought that “ÿþ” was the BOM for UTF-16 (little endian). Why would your comment be coded in UTC-16 ?/blockquote>

Because that’s the JPEG standard. Are you really that unable to do the research as to how JPEG works?…

Shocking…
NBC says:

August 9, 2013 at 18:15

Forgot to add, this is only used to indicate BOM at the very beginning of a file.

Correct, and why Hermitian believes that the code is unique to Unicode is beyond me. As you have pointed out, anyone could have checked with Wikipedia and realized that 0xFF markers have a special meaning and that the one you showed means: JPEG comment and is followed by 2 bytes indicating the length.
NBC says:

August 9, 2013 at 18:17

Is it
FlateDecode then DCT/Decode
or
DCT/Decode then FlateDecode ?

That’s simple, you do not DCTDecode after FlateDecode as that is a meaningless action.

Preview just uses DCTDecode, but it does not recompresses the JPEG as this would lead to additional losses in quality. Instead Preview just ‘copies’ the DCTDecode data.

Come on Hermitian this is PDF 101
NBC says:

August 9, 2013 at 18:23

First an MCU (Minimum Coded Unit). Block is 16 px X 16 px.

An MCU is not necessarily 16×16 although that is the case for the jpeg in question. You should of course know that this depends on the kind of subsampling used on the Cb and Cr channels.

A FlateDecoded file is not a JPEG compressed file.

Duh…

I love how you are trying to understand the PDF standard but continue to make foolish claims…

JFIF is a stricter standard for JPEG however, both can be rendered using the same approach. the PDF standard really does not care about JPEG versus JFIF.

You should really let the data guide your conclusions rather than the other way around.

The Xerox WorkCentre matches all the conditions of the specifications. It’s you who do not understand them.
NBC says:

August 9, 2013 at 18:26

Any… day… now…

I guess Hermitian has realized that he was totally unable to extract the JPEG comment using his high level tools and failed…

It has become painfully self evident that the use of Illustrator does not make for a diligent investigation.

Now the poor guy is struggling with PDF and JPEG standards…

So cute..
NBC says:

August 9, 2013 at 18:53

Second the top and left edges of a JPEG image must lie on a 8 x 8 pixel block boundary, but the bottom and right edges need not. This is evidently required to permit JPEG compression of the image file.

Huh… The way jpeg works, the top and left ALWAYS are on an 8×8 bit boundary… Duh… And no, the document need not be a multiple of 8 or 16 unless you want to have lossless rotation. So the reason why PDF maintains the original orientation of the JPEG is to avoid messing with the JPEG.

I am glad that you are finally catching up with the rest of us…

As to JPEG there are several standards you need to be aware of…

And yes, it is a file format standard…

ISO/IEC 10918-1:1994, Digital Compression and Coding of Continuous-Tone Still Images (informally known as
the JPEG standard, for the Joint Photographic Experts Group, the ISO group that developed the standard).

ISO/IEC 15444-2:2004, Information Technology—JPEG 2000 Image Coding System: Extensions.

JFIF improves interchange as it “JFIF defines a number of details that are left unspecified by the JPEG Part 1 standard (ISO/IEC IS 10918-1, ITU-T Recommendation T.81):”

PDF does not specify necessarily the standard, it just provides for a DCTDecode filter which is subsequently rendered using a JPEG renderer.

From the standard

DCTDecode – Decompresses data encoded using a DCT (discrete cosine transform) technique based on the JPEG standard, reproducing image sample data that approximates the original data.

JPXDecode

(PDF 1.5) Decompresses data encoded using the wavelet- based JPEG2000 standard, reproducing the original image data.

and

The DCTDecode filter decodes grayscale or colour image data that has been encoded in the JPEG baseline format. See Adobe Technical Note #5116 for additional information about the use of JPEG “markers.”

Hermitian, please spend some more time understanding and less time jumping to conclusions.

Even the WorkCentre 7655 specifications do not require the JFIF standard.

Come on Hermitian, stop embarrassing yourself.
NBC says:

August 9, 2013 at 19:03

Hermitian: Why would I want to extract the JPEG from the 7535 file when I have already done that four different ways and got the same results ? And I will match my methods with yours anytime. And besides, as I have already stated, I don’t waste my time on doomed projects.

So you are abandoning your ‘the WH PDF is clearly a forgery’ foolishness? As well as your other doomed and failed hypotheses? We have a wonderful list of your misadventures by now.

Hermitian tries again and fails

Hermitian: Here is the code that blows your Xerox forger to pieces:

NBC: I extracted the string from the WH LFBC PDF:

%&'()*456789:CDEFGHIJSTUVWXYZcdefghijstuvwxyz¢£¤¥¦§¨©ª²³´µ¶·¸¹ºÂÃÄÅÆÇÈÉÊÒÓÔÕÖ×ØÙÚáâãäåæçèéêñòóôõö÷øùúÿÄ    
ÿÄµw!1AQaq"2B¡±Á    #3RðbrÑ
$4á%ñ&'()*56789:CDEFGHIJSTUVWXYZcdefghijstuvwxyz¢£¤¥¦§¨©ª²³´µ¶·¸¹ºÂÃÄÅÆÇÈÉÊÒÓÔÕÖ×ØÙÚâãäåæçèéêòóôõö÷øùúÿþYCbCrÿ

So here is the code from the preview version of the WH 7535

%&'()*456789:CDEFGHIJSTUVWXYZcdefghijstuvwxyz¢£¤¥¦§¨©ª²³´µ¶·¸¹ºÂÃÄÅÆÇÈÉÊÒÓÔÕÖ×ØÙÚáâãäåæçèéêñòóôõö÷øùúÿÄ    
ÿÄµw!1AQaq"2B¡±Á    #3RðbrÑ
$4á%ñ&'()*56789:CDEFGHIJSTUVWXYZcdefghijstuvwxyz¢£¤¥¦§¨©ª²³´µ¶·¸¹ºÂÃÄÅÆÇÈÉÊÒÓÔÕÖ×ØÙÚâãäåæçèéêòóôõö÷øùúÿþYCbCrÿ

Good luck with another failed forgery hypothesis my friend… Do you ever bother to check your work before making foolish claims?

Hermitian says:

August 9, 2013 at 21:09

NBC
results ?

“Because none of your methods likely maintain the integrity of the jpeg? That’s the problem with high level tools”

You keep trying to sell me the same Bull !

The problem is you see is there is no such thing as a JPEG except for people like you who don’t know any better. JPEG is a file compression standard designed primarily for photographs by photographers.

The file format is JIFF or JIF NOT JEPEG you Dummy !

So let’s get something straight Buddy. You don’t get to make up your on rules, standards, specifications or terminology.

Now when you FlateDecode a JIFF formated file then it’s a JIFF file that has been compressed by the FlateDecode filter. It’s NOT a JPEG file !

Your FlateDecode does more to corrupt your “ficticious JPEG file” than any of my many JIFF extraction tools could possibly do. All these tools produce JFIF formated bitmaps when requested. These tools produce JIFFs of the same pixel resolution if care is used when setting the extraction parameters.

And then you have the gall to brag about how your Xerox forger buddy then applies DCTDecode (i.e.JPEG compression) on top of the FlateDecode compression). And you don’t’ even know which filter is applied first. And if that’s not enough, you then apply additional compression in the Preview print to PDF step. And don’t forget the unknown file mangling that the e-mail step to Preview does to your “JPEG”.

So I’ll gladly stand behind and defend my JIFF extracted files — and also the PNG, GIF, TIFF, and BMP formatted files that I have produced.

And I will continue to correct you when you refer to your “JPEG extractions”.
NBC says:

August 9, 2013 at 21:25

Now when you FlateDecode a JIFF formated file then it’s a JIFF file that has been compressed by the FlateDecode filter. It’s NOT a JPEG file !

It’s a FlateDecode encoded JPG encoded stream. The DTCStream can be extracted and folows the JPEG format. Now, if you use your high level tools, it may turn it into a JFIF formatted file but you can just extract the DCTDecode and call in x.jpg and it will open up in graphics programs.

Your misguided understanding of the issues is funny and actually has helped me to further strengthen my case now that you too has found the YCbCr string and guess what. I have found an online Xerox file which contains the string…

I do so thank you for your continued questioning which serves to strengthen my arguments and helps me improve my hypotheses.

Your statement about applying DCTDevoce on top of FlateDecode is totally wrong, it’s the other way around. Understanding is not your strongest point but DCTEncoding a Flatedecoded stream makes no sense.

Geez. This is basic stuff.
Hermitian says:

August 9, 2013 at 23:37

NBC

“Good luck with another failed forgery hypothesis my friend… Do you ever bother to check your work before making foolish claims?”

Oh! Believe me I check all of my work. For example the same string always appears in a JIFF thumbnail. Kevin Davidson’s “before the fact” Obama LFCOLB has a thumbnail. So that file has two JFIF headers.

So you clipped your matching string from your “ficticious JPEG” and associated that string with the Preview PDF? So where’s the string from the Preview PDF Dude?

And more importantly, where’s the string from the Xerox Workcenter 7655 PDF. And don’t use the FlateDecode excuse again unless you know for sure that the FlateDecode compression was applied after the DCTDecode compression. Otherwise, according to WKV that string should be right there in the PDF.

By the way where’s your JFIF label from your “fictitious JEPEG” ?
NBC says:

August 9, 2013 at 23:43

And more importantly, where’s the string from the Xerox Workcenter 7655 PDF. And don’t use the FlateDecode excuse again unless you know for sure that the FlateDecode compression was applied after the DCTDecode compression. Otherwise, according to WKV that string should be right there in the PDF.

The 7655 Xerox WorkCenter also encodes the background as /FlateDecode /DCTDecode of course. When saving using preview, preview only copies the /DCTDecode data.

I have shown how for the Xerox 7535 WorkCentre, the comment YCbCr can be easily observed.

So you clipped your matching string from your “ficticious JPEG” and associated that string with the Preview PDF? So where’s the string from the Preview PDF Dude?

Nothing fictitious about the jpeg, it is the part after the DCTDecode between stream and endstream. If you copy the one in the Xerox raw file you need to first run it through a deflate step. I provided you with the simple python script to do this. Alternatively, you can look at the preview file and copy the part between the two stream tags and save it with a jpg extension. The file then opens beautifully in editors, viewers etc.

You really do not understand PDF format, what DCTDecode means… I can guarantee you that when you see JFIF in your headers or Adobe, you have likely ruined the original contents found in the PDF.

That’s the problem of working with high level tools. They do not explain how and what they do.

I have shown you how both the WH LFBC PDF and the Xerox Preview PDF contain the same DCTDecode data, at least where it matters.

If you cannot comprehend the relevance of this, that is not really my problem.

PS: I have located another WorkCentre JPEG online which contains YCbCr… Thanks to your suggestions. Again I do thank you for helping me make my case.
NBC says:

August 9, 2013 at 23:45

Oh and I also had the AP document run through the Xerox scanner to study JBIG2 and so far the data again show how it creates identical characters where their shapes are closely related.

And step by step, I support my work flow with more and more evidence…

Thanks for your encouragements.
NBC says:

August 9, 2013 at 23:45

And don’t use the FlateDecode excuse again unless you know for sure that the FlateDecode compression was applied after the DCTDecode compression.

You can see that this is the case by opening the PDF in a text editor. Wow.. This is so trivial that I am discouraged that I have to once again explain this in step by step fashion to you.
NBC says:

August 9, 2013 at 23:48

By the way where’s your JFIF label from your “fictitious JEPEG” ?

There is none as there is none in the original data. Geez… and you do not need a JFIF label to be able to open up the file in a viewer.

Sigh…
Hermitian says:

August 10, 2013 at 00:12

NBC

“It’s a FlateDecode encoded JPG encoded stream. The DTCStream can be extracted and follows the JPEG format. ”

HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH

What JPEG Format standard ? It doesn’t exist. There is a JIFF and JIF format standard. So if you are right then your “fictitious JPEG formatted file” should have a JPEG label in line one. So what about posting that line from your “JPEG extraction” including the line number. And then we will know for sure that your JPEG extractor doesn’t measure up.

HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH

“Now, if you use your high level tools, it may turn it into a JFIF formatted file but you can just extract the DCTDecode and call in x.jpg and it will open up in graphics programs.”

HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH

So let’s make sure I’m completely understanding you. What you’re really saying is that embedded within the PDF there is a real JPEG formatted file but my “high level” tools turn the pristine JPEG formatted file into a bad ole JFIF file ?

That’s what I thought you meant…unfortunately. You can’t be serious man !

Besides you are clueless about the capabilities of my software tools. Let me suggest that you concentrate on figuring out what your own tools are doing and let me worry about what my tools are doing.

HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH

“Your statement about applying DCTDevoce on top of FlateDecode is totally wrong, it’s the other way around. Understanding is not your strongest point but DCTEncoding a Flatedecoded stream makes no sense.”

“Geez. This is basic stuff.”

HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH
Right — basic stuff that you should have nailed down months ago.

OK ! Then today’s work flow is:

1 Xerox Workcenter applies DCTDecode compression to the scanned bitmap file.

2. Xerox Workcenter then applies FlateDecode filter to the already compressed bitmap file.

3. Preview “print to PDF” applies DCTDecode compression to the doubly compressed bitmap file.

I agree that applying DCT decoding on top of Flate decoding makes no sense. So why are you doing just that ?

.
Hermitian says:

August 10, 2013 at 00:17

NBC

“The 7655 Xerox WorkCenter also encodes the background as /FlateDecode /DCTDecode of course. When saving using preview, preview only copies the /DCTDecode data.”

“Your statement about applying DCTDevoce on top of FlateDecode is totally wrong, it’s the other way around. Understanding is not your strongest point but DCTEncoding a Flatedecoded stream makes no sense.”

HHHHHHHHHHHHHHHHHHHHHHHHHHHHHH

Does anybody else see any conflict between these two statements of NBC’s ?

What the Hell are you really doing NBC ?
W. Kevin Vicklund says:

August 10, 2013 at 00:30

And more importantly, where’s the string from the Xerox Workcenter 7655 PDF. And don’t use the FlateDecode excuse again unless you know for sure that the FlateDecode compression was applied after the DCTDecode compression.

Try reading the PDF Reference manual provided by Adobe, Hermie. For instance, from page 22 of the 1.7 version (and it’s the same for earlier versions, so long as the compression format is supported):

The filter or filters for a stream shall be specified by the Filter entry in the stream’s dictionary (or the FFilter entry if the stream is external). Filters may be cascaded to form a pipeline that passes the stream through two or more decoding transformations in sequence. For example, data encoded using LZW and ASCII base-85 encoding (in that order) shall be decoded using the following entry in the stream dictionary:

EXAMPLE 2
/Filter [/ASCII85Decode /LZWDecode]

BTW, here’s a quote from the Wikipedia article on JPEGs.

Image files that employ JPEG compression are commonly called “JPEG files”, and are stored in variants of the JIF image format.

Funny how Hermie failed to quote that part…
NBC says:

August 10, 2013 at 00:37

Does anybody else see any conflict between these two statements of NBC’s ?

Ah, I notice you are not familiar with the language of PDF /FlateDecode/DCTDecode means apply DCTDecode first. It’s a chain…

I assumed you had read the manual

For example, data encoded using LZW and ASCII base-85 encoding (in that order) shall be decoded using the following entry in the stream dictionary:

EXAMPLE 2 /Filter [/ASCII85Decode /LZWDecode]

RTFM my friend, it avoids so much embarrassment
Reality Check says:

August 10, 2013 at 00:44

I doubt Hermie knows what the acronym RTFM means because he has never in his life, even once, RTFM. 😆
NBC says:

August 10, 2013 at 00:51

doubt Hermie knows what the acronym RTFM means because he has never in his life, even once, RTFM. 😆

I try, I really try but I keep overestimating his familiarity with these concepts…
Hermitian says:

August 10, 2013 at 00:59

NBC

“Your misguided understanding of the issues is funny and actually has helped me to further strengthen my case now that you too has found the YCbCr string and guess what. I have found an online Xerox file which contains the string…”

HHHHHHHHHHHHHHHHHHHHHHHHHHHHHH

So let me take a whack at translating this NBC blather. Given the doubly compressed Xerox Workcenter PDF file and one extraction step and two decompressions out falls a YCbCr label which just happens to be the default color space for the JFIF standard.

You really can’t make this stuff up. ROTFL….

And I found the YCbCr label in line 616 of the WH LFCOLB PDF weeks ago. Then yesterday I learned that neither you nor Vicklund knew it was there even after I posted my findings more than once. And just so you don’t miss it again … I said PDF…not JPEG or JFIF.

So the score is one for me and zero for you and Vickland.

You’ve yet to post any finding of your YCbCr in the Xerox Workcenter PDFs.

And I’ve already searched the Xerox 7535 PDF and it’s not there.

Here’s some questions for you.

Of all the files that you have tracked down that have the YCbCr label how many are PDFs and how many are JFIFs ?

And how many of these same files were run through Preview for the print to PDF step?
W. Kevin Vicklund says:

August 10, 2013 at 01:02

OK ! Then today’s work flow is:

1 Xerox Workcenter applies DCTDecode compression to the scanned bitmap file.

2. Xerox Workcenter then applies FlateDecode filter to the already compressed bitmap file.

3. Preview “print to PDF” applies DCTDecode compression to the doubly compressed bitmap file.

I agree that applying DCT decoding on top of Flate decoding makes no sense. So why are you doing just that ?

Wrong! Are you utterly incapable of reading for comprehension? If a bitmap is doubly compressed, it must be doubly decompressed to display properly. Let’s correct your errors:

1 Xerox Workcenter applies DCT compression to the scanned bitmap file.

2. Xerox Workcenter then applies deflate compression to the already compressed bitmap file.

3. Opening in Preview then applies the FlateDecode filter to decompress the doubly compressed bitmap file.

4. Preview then applies the DCTDecode filter to decompress the now singly compressed bitmap file.

5. Preview “print to PDF” applies DCT compression to the uncompressed bitmap file.
W. Kevin Vicklund says:

August 10, 2013 at 01:38

And I found the YCbCr label in line 616 of the WH LFCOLB PDF weeks ago.

Right where NBC said it was, months ago. In the bitstream for the DCT encoded object – commonly called a JPEG.
NBC says:

August 10, 2013 at 01:42

And I found the YCbCr label in line 616 of the WH LFCOLB PDF weeks ago. Then yesterday I learned that neither you nor Vicklund knew it was there even after I posted my findings more than once. And just so you don’t miss it again … I said PDF…not JPEG or JFIF.

again you are wrong. The presence of the label was announced many weeks ago. There appear to be some problems with your short term memory.

And I’ve already searched the Xerox 7535 PDF and it’s not there.

And yet I have shown it is there so you are again showing operator error.

Look my friend, your continued ignorance of basic principles and your unfamiliarity with basic computer tools has led you down a wrong path.

Come on Hermitian, stop embarrassing yourself with these foolish claims. I pointed out the embedded string in the jpeg that I extracted as well as by showing that it exists in the Whitehouse PDF and PDF’s created by Xerox.

You really do not understand this now do you?…

I repeat the same steps as I hypothesized were used for the WH PDF and all these items drop in line. And you claim that somehow you are winning.

Oh the smell of ignorance in the morning.
NBC says:

August 10, 2013 at 01:43

5. Preview “print to PDF” applies DCT compression to the uncompressed bitmap file.

Nope, preview takes the original DCTDecoded stream as recompressing the bitmap would lead to either an inflation in size or reduction in quality.

That’s why the YCbCr comment ‘survives’ preview…
NBC says:

August 10, 2013 at 01:45

And just so you don’t miss it again … I said PDF…not JPEG or JFIF.

There is really no difference as DCTDecode is JPEG and thus if you find it in the DCTDecode stream you will obviously find it in the jpeg.

Have you been asleep all these weeks where I have been trying to educate you about PDF… With little success it seems, looking at the filter debacle.
NBC says:

August 10, 2013 at 02:09

Right where NBC said it was, months ago. In the bitstream for the DCT encoded object – commonly called a JPEG.

Not much remains of poor Hermitian’s ‘forgery’ claims now that we have so many explained artifacts through a simple workflow.

Next steps:

JBIG2 effects
Halo formation
W. Kevin Vicklund says:

August 10, 2013 at 02:10

you can just extract the DCTDecode and call in x.jpg and it will open up in graphics programs.

I don’t think this is strictly true. I think this works if the graphics program “knows” it is receiving a DCT encoded file, but as a standalone file I think it does need the JFIF or JIF header to be opened by a generic graphic program.
NBC says:

August 10, 2013 at 02:16

I am not sure. The signature for jpeg is quite easy to detect 0XFF 0XD8 at the beginning.

I can call it stream12.abc and it still opens as a jpeg in Firefox
NBC says:

August 10, 2013 at 02:17

Photoshop does not like the .abc but does load when it is called .jpg
NBC says:

August 10, 2013 at 02:22

JFIF places its data in APP0 or tag FF E0 and I bet you I can remove all of it and it will still load.

APP0 deleted, still works, there was really not much data in it.

EXIF stream removed, still works

JFIF is not a requirement really all the necessary information is stored outside the APP0 I believe. JFIF resolved some ambiguities…

The main area of incompatibility is RGB encoding but that’s not a very good encoding for JPEG anyway and how the image is stored. Top first… A DCTDecode stream is for all practical purposes compatible with JFIF. Of course, there are PDF parameters which can be used to indicate variations but none of them apply to the Xerox examples.
Hermitian says:

August 10, 2013 at 10:57

NBC

BTW, here’s a quote from the Wikipedia article on JPEGs.

“Image files that employ JPEG compression are commonly called “JPEG files”, and are stored in variants of the JIF image format.”

HHHHHHHHHHHHHHHHHHHHHHHHHHH
There he goes again! Too old to change old habits. So a JIFF formated file that has been compressed by FlateDecode is NOT a JPEG file but rather a JIFF file.

And did you notice that NBC gave a couple of examples but didn’t really say what order he is using. You see Engineers know what to watch for because they run into these slippery fellows all the time. And NBC is one of those slippery types who don’t acknowledge that the word accountability is really in the dictionary.

So NBC ! Which order does your Xerox forger use to apply the FlateDecode and DCTDecode filters to the background bitmap?

Is it PDF /FlateDecode/DCTDecode

or is it PDF /DCTDecode/FlateDecode ?

Let me guess — your answer is both.
Hermitian says:

August 10, 2013 at 11:41

WKV

W. Kevin Vicklund says:

August 10, 2013 at 01:02

“”OK ! Then today’s work flow is:

“”1 Xerox Workcenter applies DCTDecode compression to the scanned bitmap file.

“”2. Xerox Workcenter then applies FlateDecode filter to the already compressed bitmap file.

“”3. Preview “print to PDF” applies DCTDecode compression to the doubly compressed bitmap file.

“”I agree that applying DCT decoding on top of Flate decoding makes no sense. So why are you doing just that ?””

”
Wrong! Are you utterly incapable of reading for comprehension? If a bitmap is doubly compressed, it must be doubly decompressed to display properly. Let’s correct your errors:

1 Xerox Workcenter applies DCT compression to the scanned bitmap file.

2. Xerox Workcenter then applies deflate compression to the already compressed bitmap file.

3. Opening in Preview then applies the FlateDecode filter to decompress the doubly compressed bitmap file.

4. Preview then applies the DCTDecode filter to decompress the now singly compressed bitmap file.

5. Preview “print to PDF” applies DCT compression to the uncompressed bitmap file.
”
HHHHHHHHHHHHHHHHHHHHHHHHHHHH
So can we get NBC to vouch for your filter order ?

So you are admitting that your workflow does multiple compression/decompression/re-compression steps.

Now both FlateDecode and DCTDecode are lossy filters. Therefore every compression/decompression cycle throws away information, mostly in the high-frequency end of the spectrum. That’s not a problem for continuous-tone photographs but the background layer is not a continuous-tone photograph. The Green cross-hatch safety paper background carries no useful information (except for the anomalies that indicate forgery). Only the form lines and text convey useful information. Consequently, these filters should not be used at all to compress a birth certificate document. But the bottom line is that you are using them and therefore we know for a fact that you are degrading the image with every application of compression or decompression.

I have read that you are examining individual characters and are seeing identical characters down to the individual pixel. That is not unusual for compressed PDF documents. Wherever possible, the compression algorithms for PDF take advantage of repeating patterns. But that is not a reliable measure of image quality. You should be comparing each character of the PDF image to the same character in the paper original.
Hermitian says:

August 10, 2013 at 11:50

NBC

“”And I’ve already searched the Xerox 7535 PDF and it’s not there.””

“And yet I have shown it is there so you are again showing operator error.”

HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH
Please post the line number which contains the YCbCr label in the Xerox document “wh-lfbc-scanned-xerox-7535-wc.pdf”.
Hermitian says:

August 10, 2013 at 12:05

NBC

Come on Hermitian, stop embarrassing yourself with these foolish claims. I pointed out the embedded string in the jpeg that I extracted as well as by showing that it exists in the Whitehouse PDF and PDF’s created by Xerox.

HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH
Please post the Xerox PDF files for which you have searched and located the YCbCr label along with the line numbers of same.

It’s that little comment of yours “that I extracted” which worries me.

Here is what Didier Stevens has to say about PDF stream objects.

http://blog.didierstevens.com/2008/05/19/pdf-stream-objects/

I especially liked these two comments from his blog:

”
2. Some of these filters cannot be used to hide scripts with exploits, because they do lossy compression and are suitable only for images. I think (but am not 100% sure) that CCITTFaxDecode, JBIG2Decode, DCTDecode and JPXDecode all fall in this category. They might be usable for a denial-of-service attack (the equivalent of the ZIP bomb), although I have my doubts about that too.
Comment by Vesselin Bontchev — Wednesday 28 May 2008 @ 18:03

3. It’s true that these filters are lossy, but the first 3 of them take parameters, and I believe it’s possible to parameterize a lossless compression. But I have not tested this.
Comment by Didier Stevens — Wednesday 28 May 2008 @ 20:40
”

I’m sure you remember that Didier wrote the sofware tools that you are using.
Hermitian says:

August 10, 2013 at 14:18

From Wikipedia

“Typical usage[edit]

“The JPEG compression algorithm is at its best on photographs and paintings of realistic scenes with smooth variations of tone and color. For web usage, where the amount of data used for an image is important, JPEG is very popular. JPEG/Exif is also the most common format saved by digital cameras.

“On the other hand, JPEG may not be as well suited for line drawings and other textual or iconic graphics, where the sharp contrasts between adjacent pixels can cause noticeable artifacts. Such images may be better saved in a lossless graphics format such as TIFF, GIF, PNG, or a raw image format. The JPEG standard actually includes a lossless coding mode, but that mode is not supported in most products.

“As the typical use of JPEG is a lossy compression method, which somewhat reduces the image fidelity, it should not be used in scenarios where the exact reproduction of the data is required (such as some scientific and medical imaging applications and certain technical image processing work).

“JPEG is also not well suited to files that will undergo multiple edits, as some image quality will usually be lost each time the image is decompressed and recompressed, particularly if the image is cropped or shifted, or if encoding parameters are changed – see digital generation loss for details. To avoid this, an image that is being modified or may be modified in the future can be saved in a lossless format, with a copy exported as JPEG for distribution.”

http://en.wikipedia.org/wiki/JPEG#Syntax_and_structure
NBC says:

August 10, 2013 at 18:37

I’m sure you remember that Didier wrote the sofware tools that you are using.

Yes, and other software tools. Indeed, hackers have found some pretty innovative ways to hide executable javascript in PDF’s which can cause a bit of havoc. But again, I have provided you with the Xerox PDF and the Xerox PDF saved with Preview. The latter shows the YCbCr tag. The former has the tag FlateDecoded so you need to extract the object (cut and paste in hex editor) and then deflate it.
I have even given you the tool. Have you followed my suggestions?
NBC says:

August 10, 2013 at 18:38

Please post the line number which contains the YCbCr label in the Xerox document “wh-lfbc-scanned-xerox-7535-wc.pdf”.

Object 12 which you need to deflate as it is encoded with Flatedecode. How many more times do I have to tell this to you? Take a text file and type YCbCr, and zip it up… What line contains the YCbCr tag?…
NBC says:

August 10, 2013 at 18:43

Hermitian: So can we get NBC to vouch for your filter order ?

So you are admitting that your workflow does multiple compression/decompression/re-compression steps.

Now both FlateDecode and DCTDecode are lossy filters.

DCTDecode is lossy, FlateDecode is not. You’re not much of an expert in this area. And the workflow is simple:

The Xerox workcentre adds a ‘zip’ like step to its DCTDecode step.
Preview will unzip the object and render the DCTDecode (jpeg) as a bitmap. When saving preview just stores the DCTDecode (jpeg) code. This is something I have tested and so can you.

You are totally clueless in your latest remarks… It’s not us who are using multiple filters, it’s the workflow and yet we still maintain the string as, for obvious reasons, software does not like to touch JPEG encoded objects.

And if you cannot understand this then you have shown yourself clueless about the DCTDecode step.
NBC says:

August 10, 2013 at 18:57

So NBC ! Which order does your Xerox forger use to apply the FlateDecode and DCTDecode filters to the background bitmap?

Is it PDF /FlateDecode/DCTDecode

or is it PDF /DCTDecode/FlateDecode ?

Let me guess — your answer is both.

I guess poor Hermitian still cannot read the PDF standards…

The file shows

/FlateDecode/DCTDecode which means that it applies JPEG compression first and compresses the jpeg using Flate.

This is not just the only logical interpretation but also supported by the PDF standard.

Come on Hermitian, stop embarrassing yourself.

Why would anyone use DCT on a ‘zip’ file…
In order to reverse the compression steps, you first deflate which should give you a DCTDecode object. If you rename the object to jpg, it will open up in most renderers.
Hermitian says:

August 11, 2013 at 22:50

NBC

NBC says:

August 10, 2013 at 18:37

“”I’m sure you remember that Didier wrote the sofware tools that you are using.””

“Yes, and other software tools. Indeed, hackers have found some pretty innovative ways to hide executable javascript in PDF’s which can cause a bit of havoc. But again, I have provided you with the Xerox PDF and the Xerox PDF saved with Preview. The latter shows the YCbCr tag. The former has the tag FlateDecoded so you need to extract the object (cut and paste in hex editor) and then deflate it.
I have even given you the tool. Have you followed my suggestions?”

You only provided the Xerox 7535 Workcenter PDF file — not the companion Preview PDF. And as I pointed out to you several times this single file that you have released does not contain the YCbCr label.

Why don’t you and WKV put your heads together and get on the same page as to the exact sequence of filters from the first to the last. And maybe you could also indicate when the YCbCr label should be selectable at each step for the Xerox and the Preview PDFs.

Of course you would need to know the effect that each filter has on the presence of the YCbCr label in the PDF…
NBC says:

August 11, 2013 at 22:59

You only provided the Xerox 7535 Workcenter PDF file — not the companion Preview PDF. And as I pointed out to you several times this single file that you have released does not contain the YCbCr label.

So you missed me posting the file… Figures…

And the file I provided before also contains the YCbCr comment, you just have to deflate the right object. I have provided you with the relevant steps to do so, you appear to be unable, unwilling or incapable of following these simple instructions, and I have provided you with the raw and deflated information.

Check out here

Geez Hermitian, pay attention please.

The effect of ZIP on the label is trivial, it takes the encoded data and decodes it lossless, the resulting DCTDecode data stream contains the comment.

It’s so simple, so logical so I guess I will have to spend more time educating our friend.

1. Read the PDF standard which explains the order
2. Understand that DCTDecode stream IS jpeg
3. Understand that FlateDecode stream IS zlib

And follow the instructions I have provided for you.

Just once show that you can do proper research my friend. I am getting troubled here about your inabilities or unwillingness to take these simple steps?
Hermitian says:

August 11, 2013 at 23:25

NBC

Your little exercise with YCbCr in the text editor followed by zip adds zeros between each character. This expanded Y.C.b.C.r. also has both leading and trailing zeros. And no special position in the zip file.

Hardly relevant to the question of the validity of your work.
NBC says:

August 11, 2013 at 23:27

NBC

Your little exercise with YCbCr in the text editor followed by zip adds zeros between each character. This expanded Y.C.b.C.r. also has both leading and trailing zeros. And no special position in the zip file.

Hardly relevant to the question of the validity of your work.

Huh? Are you now blaming me again for your unfamiliarity with the tools or your inability to understand what I am saying?

Sigh…

Why is Hermitian refusing to repeat my simple instructions that would extract the jpeg from the PDF?
NBC says:

August 11, 2013 at 23:40

As to zero’s it appears your editor is saving as UTF16… Look Hermitian, I can help you do these experiments, but I assume a minimum level of skills here.

Comments are closed.

Native and Natural Born Citizenship Explored

Where native and natural coincide

Extracting the WH 7535 PDF JPEG

82 thoughts on “Extracting the WH 7535 PDF JPEG”

Native and Natural Born Citizenship Explored

Where native and natural coincide

Rate this:

Share this:

Related

82 thoughts on “Extracting the WH 7535 PDF JPEG”