Educating the Confused – Deflate

For some reason unknown, our almost resident expert Hermitian has suggested that FlateDecode is a lossy compression.

Wikipedia may be helpful in correcting that impression. Of course anyone familiar with ZLIB would have known and understood this.

Lossy compression examples for PDF’s include

  • DCTDecode – Aka JPEG, always lossy even at highest quality settings
  • JPXDecode – Lossless or lossy Wavelet based JPEG
  • JBIG2 – Lossless or lossy

And let me also explain why I believe most programs do not touch DCTDecoded data. It has been argued and observed that Preview maintains the exact DCTDecode stream. This is for obvious reasons, as they could either compress the resulting bitmap with high quality but then the filesize would explode, or they could add to the compression by recompressing the bitmap. I believe that Adobe tools are more destructive here.

The way a PDF Editor typically works is that it maintains two different ‘trees’. One contains the PDF tree with all the raw objects, the other contains the rendered information. When objects are not touched, their raw data is written back to avoid the problems with JPEG. This is also why Preview maintains the landscape orientation of the images.

I can appreciate that to someone who was recently introduced to PDF encoded data, that it may appear to be somewhat overwhelming and thus when seeing a /FlateDecode/DCTDecode statement for the filter, one may be initially confused by the order. However, a quick logical analysis of the two possibilities would lead one to quickly eliminate the flow where the bitmap was first zipped up and then the zip file was somehow encoded in a lossy fashion. Imagine the surprise when trying to deflate the data to find out it is no longer a valid encoding as DCT has managed to mess it all up.

Alternatively, one could also have read the PDF standard documentation which outlines the order.

9 thoughts on “Educating the Confused – Deflate

  1. NBC

    “thus when seeing a /FlateDecode/DCTDecode statement for the filter, one may be initially confused by the order. However, a quick logical analysis of the two possibilities would lead one to quickly eliminate the flow where the bitmap was first zipped up and then the zip file was somehow encoded in a lossy fashion. Imagine”

    Or one. namely NBC, could have just answered my direct question regarding the order and that would have avoided all of this back and forth. And actually it was WKV who first addressed my question with NBC reluctantly following. By the way my PDF code parser assigns either 0 or 1 to the filter depending on the order. But it was very important to nail down the order that your Xerox Workcenter applies to the scanned bitmap.

  2. He did answer first, multiple times, but you are apparently too stupid to understand.

  3. But it was very important to nail down the order that your Xerox Workcenter applies to the scanned bitmap.

    It was in the PDF and the PDF standard would have explained it all to you. Instead, you preferred to remain uninformed.
    Stop blaming others, apply logic reason and perhaps some curiosity… It’s a great intellectual exercise.

  4. He did answer first, multiple times, but you are apparently too stupid to understand.

    A hypothesis which is gaining in strength. I cannot believe how much time I spend educating poor Hermitian when all the data are available to him.

    Oh well…

  5. NBC says:

    August 11, 2013 at 05:21

    “A hypothesis which is gaining in strength. I cannot believe how much time I spend educating poor Hermitian when all the data are available to him.”

    It’s truly amazing how peeved and cranky NBC gets right after I drag a fact out of him that narrows down the possible storylines that he could spin in the future about all the great results that he is hoarding. And his has been one of the loudest Obot voices demanding that Zullo and his posse release every scrap of data and evidence that they have. That figures for the slippery type that have the unaccountability character flaw just like Obama. You know “birds of a feather flock together.”

    Have you run into Obummer at the Xerox Workcenter lately NBC ?

    1. Where’s the object boundary data for your best shot Dude ? Not ready for press yet ? You know the trouble with dumb machines is that they tend to repeat their failures. Like when your Xerox 7535 totally missed the nearest text pixels. And the 7535 also didn’t satisfy the block edge alignment for top and left edges.

    And then shortly thereafter the Xerox 7655 became your favorite play toy.

  6. 1. Where’s the object boundary data for your best shot Dude ? Not ready for press yet ? You know the trouble with dumb machines is that they tend to repeat their failures. Like when your Xerox 7535 totally missed the nearest text pixels. And the 7535 also didn’t satisfy the block edge alignment for top and left edges.

    You must have missed my posting where I showed the 7535 satisfies the block edge alignments… ROTFL… You’re so much fun though as you continue to help me strengthen my claims.

    You’re the best…

  7. And his has been one of the loudest Obot voices demanding that Zullo and his posse release every scrap of data and evidence that they have.

    Making up stories again eh? Fascinating… Poor Hermitian is upset that I have close to a dozen artifacts that can be explained by the Xerox work flow… And when he insists I release more because he does not believe that the images align at the 8-bit boundaries, he shows himself to be to lazy to do the work himself.

    I have provided him with all the necessary data and tools and he still cannot extract a simple jpeg encoded image from a PDF…

    Come on Hermitian, you can do better than making foolish ad hominems and strawmen…

    Don’t worry Hermitian, in the end you have made great contributions to my workflow hypothesis, allowing me to strengthen it step by step, while trying to educate you on simple concepts.

    So how is the work on examining Dr C’s thumbnail going? Have you concluded that it is likely a forgery🙂

    I honestly do not care what Zullo and his posse release, as I doubt that they have anything that amounts to anything.

    As to me releasing my data? As I explained before, I will release the data in due time, while I continue to verify and double check just like a good researcher would do. In the mean time I have released the 7535 workflow documents which, not surprisingly, explain most of the artifacts already.

    Other than you whining that I release more data that anyone with the right tools and mindset could collect themselves, your contributions have been less than impressive lately.

    Hope that all is well my friend, as I continue to rely on you to help me strengthen my hypothesis.

  8. It’s truly amazing how peeved and cranky NBC gets right after I drag a fact out of him that narrows down the possible storylines that he could spin in the future about all the great results that he is hoarding.

    He’s not cranky because you “dragged a fact out of him.” He’s cranky because he’s spent since at least July 20 – three weeks! – trying to tell you this particular fact, which you repeatedly rejected.

    NBC, on July 20, said:

    Yes, as I showed, the JPG from Xerox is encoded in DCTDecode (JPG) and then in lossless FlateDecode. Somewhat of an overkill.
    When you open the document in Preview and print as pdf, you will find that Preview cleans up the double compression.

  9. He’s cranky because he’s spent since at least July 20 – three weeks! – trying to tell you this particular fact, which you repeatedly rejected.

    Stop confusing Hermitian with facts. I doubt he even understood what I was saying then… He still appears to be somewhat confused… But I am a patient person, sometimes a bit exasperated by the depth of Hermitian’s knowledge in some of these areas… Shocking I would say.

Comments are closed.