Ocr and pdf quality

Hi there,

I am switching from DV2 to DV3 and try to find the best ocr configuration.
As I cannot control of the the compression rate anymore I try to find out the differences of letting Scansnap do the ocr job against DV3. As I understood they are both using abby but in different versions.

After some tests I cannot finally say which one does a better job regarding pdf quality and text recognition but there are some noticeable difference.

PDFs with ocr generated from Scansnap

are bigger
seem to have a little bit worse ocr quality on some of my test documents


are being generated more quick
they are displayed faster within devonthink

The display speed seems to depend on a different compression algorithm as I can see that there is a quick but poor preview which is sharpening after a second.

I like this behavior and I would like to have this too using the integrated ocr conversion. Can I take control of this anywhere ? What are your experiences with ocr using Scansnap vs Devonthink ?


By default in DEVONthink the compression of the OCR’d PDF uses an option that provides a balance between the quality of the resulting file, its size and the time of processing. If you require a high quality image, turn off the “Compress PDF” option in the the OCR section of the preferences. This will optimize the PDF export to generate the best quality file, however note that it may take longer to generate the file than with the default setting.

In my experience doing heavy OCR, depending on the original document, result could be better or worse depending on the tool used.

In my experience, I only use DT OCR capabilites for trivial captured stuff. Normally I use PDF Pen (that does not touch the image quality but use to fail to OCR some pages) and Cisdem PDF Creator, that has a very good balance in compressing and lowering image quality. Sometimes I get a 400 MB scanned PDF, and after OCR it with Cisdem, it has about 80 MB with no noticeable difference in image quality even with big zoom.