Large PDF file growth when OCRing

Using DPO 2.5.1 on a 203 page scanned document of 49.5 MB, the file grows to 130.5 MB after OCRing it within DPO.

If I use "Searchable PDF Converter"which came with my ScanSnap ix500, for the OCRing, the same file grow to only 50.8 MB.

I believe that the ScanSnap S/W uses Abbyy FineReader for OCRing, just like DPO. So why does OCRing w/in DPO result in such a huge file growth?

thx

hf

DEVONthink Pro Office Preferences > OCR allows one to check the option to retain the resolution of the original scan image, or to set the resolution (dpi) of the searchable PDF.

For most scans, I set the resolution to 130 dpi and 50% image quality. I’m satisfied with the view/print appearance of the searchable PDFs, and their file size is usually less than that of the original scanner output file.

Hi Bill:

I understand that your recommendation would shrink the file. But what is DPO doing differently from the ScanSnap S/W, which also uses the same underlying OCR engine, to cause the file to grow so much? I checked and the original file and both OCR-ed files have the scanned images at 300 DPI.

I’ve also noticed this.

If I scan and OCR using the ScanSnap software, the file size is considerably smaller than if I leave the OCR to DTPO.

In DTPO I’ve got the resolution to “same as scan” and quality to 75% (which I think are the default). The OCR speed is automatic.

In ScanSnap, I set the compression ratio to “3” and leave everything else to automatic.

A two page PDF with handwriting and some typed text, will be 456Kb if the OCR is done by ScanSnap and yet if the OCR is done by DTPO the file size is 1Mb.

If I turn down the “quality” to 50% in DTPO I get a file size of 760Kb.

How does the “quality” setting alter the image quality? For documents which are mostly black and white, what would be the recommended setting vs file size?

Thanks.

The ‘quality’ setting is relevant for color images. So is the Compression setting in ScanSnap Manager.

Hi Bill

Is the OCR process more accurate with DTPO in order to give the larger file size? I had thought they used the same engine as ScanSnap.

Thanks.

The OCR module licensed from ABBYY hasn’t received an update by ABBYY. Perhaps in the future.

Seems long overdue. I’ve noticed quite a few ABBYY FineReader Express app updates on MacUpdate since the last OCR module update. There’ve also been ABBYY FineReader for ScanSnap updates.