Filesize increase after OCR


I’m very new to Devonthink.
I currently imported some pdf-files scanned in color (compressed) with 200dpi.
Their filesizes are round about 200KB per page. After making the pdf’s searchable their sizes are round about 3MB.
For a full colored scan or a higher dpi value the filesize increases up to 10 MB per Page.

Am I doing wrong? Do I have wrong settings or is this the expected behaviour of Devonthink?

THanks a lot for your Support!

I recommend that scans be made at a resolution of 300 dpi, as 200 dpi (especially for color) may not produce images with sufficient resolution of text characters, and so degrade OCR accuracy.

I suspect that Preferences > OCR has the option to retain the resolution of the original image chosen by checking it. That usually results in large file sizes. During re-rasterization of the image layer Apple’s code in OS X works, but doesn’t use any compression, so files can be large.

If your scanner produces clean, high contrast scans (like my ScanSnap set for 300 dpi black & white scans or 600 dpi for color) you might experiment with unchecking the above option and setting dpi and image resolution instead. There may be some degradation of the image layer of the searchable PDF, but there can be major savings in file size.

Most of my OCR is done with settings in DEVONthink Pro Office Preferences > OCR of 130 dpi and 50% image quality. The resulting view/print quality of the searchable PDFs is acceptable to me, and the PDFs are usually significantly smaller than the original scanner output files.

It’s also possible to use an external application to shrink existing large PDF files, such as PDF Shrink, Adobe Acrobat, etc. Or to re-save the PDF using a Quartz Filter option (Google for custom Quartz Filter settings).

Many Thanks for your feedback.

I experimented with the OCR Settings of DEVONthink.
I’m not sure, if I understand the process right.
At first I scan the document with a high dpi value e.g. 300 to a pdf.
Then I import these document to devonthink with the settings provided by you (130dpi and 50% of quality). With the good quality scan devonthink could bring up a better ocr result and then compress the file?

If I’m right, it is better to scan with a better quality and then compress at least the document by Devnthink.