Converting to Searchable PDF results in massive PDF files

First off a happy Thanksgiving to all - or at least those in the US. :smiley:

I’ve been using DevonThink Pro Office for some time now but I’ve run into an issue with the OCR functionality. I have a number of PDFs that I got off the web (scans of old manuals and the like) that are already in DevonThink and I recently tried to convert them to searchable PDFs. Using the “Data --> Convert --> to Searchable PDF” menu option works - but the resulting files are more than 30 times the size of the original.

One example: I have a 1.7MB, 14 page PDF that when converted ends up being 55MB! Converting back from searchable PDF to plain text I can see that the text is just a few KB, as expected.

In my OCR preferences I have Resolution set to Same as Scan, Quality is 100%, and Recognition is set to Automatic.

Is there anything I can do that will maintain the quality of the original PDF but not result in gigantic file sizes?

I’m also interested in getting a ScanSnap for document scanning. Is this expected behavior when using that for scanning as well?

If you are willing to accept some degradation in the view/print quality of the searchable PDFs, experiment with unchecking the DEVONthink Pro Office Preferences > OCR option to retain the original scan resolution, and instead set the dpi and image quality options as you wish.

I do most of my scans using a ScanSnap and with ScanSnap Manager setting for Black & White scans. The resulting clarity of those scans lets me set Preferences > OCR for 130 dpi and 50% image quality, resulting in searchable PDFs that are better than FAX quality – and that are usually significantly SMALLER than the original scanner output file.

I tried converting with a 150 dpi resolution and varying quality and I was able to get much smaller file sizes. Still a few MB bigger, but that’s no big deal.

Why does DevonThink degrade or change the quality of the original document at all? It would be great if it simply preserved the original file and added the resulting plain text output to make it searchable.

Thanks Bill for the help.