OCR file size revisited

I was excited to find the new OCR option in DTPO 2.0 final release to set the resolution to “same as scan”. I understand this should mean that the OCR text layer is added but the image itself isn’t changed, hopefully keeping file size & image quality similar to the original document. I’m using version 2.0.1 now & just tried this.

I have just OCR’d a four-page document (combination of greyscale & colour pages) using various options & the results are surprising:

(1) Original document file size 872 KB.

(2) OCR in Adobe Acrobat using the “Image (Exact)” setting. This adds an invisible text layer to the document, but leaves the image layer intact so the image quality is maintained. File size = 896 KB.

(3) OCR in DTPO using 150dpi & 50% quality. Significant downgrading of image quality noted, as expected. This is acceptable for archival purposes for documents where image quality isn’t essential. This is my default setting for scanned documents, but I process using Acrobat rather than DTPO. File size = 598 KB.

(4) OCR in DTPO using “same as scan” setting. Image quality maintained. File size = 1.5 MB.

Now, that’s surprising. Why has “same as scan” increased the file size so much? I would have expected it to be similar to the Acrobat-processed file size.

Further test on another one-page greyscale file. The “same as scan” option decreased the file size by half. Don’t know what’s going on here.