OCR bug for certain pdfs on DEVONthink Pro 3.8.6

Hi Jim (& Alan)

Thanks very much for fixing the image degradation on OCR - it looks so much better. :slightly_smiling_face:

I’m getting some big increases in file sizes when I apply OCR and thought it might help if I posted a summary in case anyone else is having the same experience.

Attached is an example - original pdf is 287 KB, after OCR it’s 11 MB.

Before - 287 KB.pdf (280.0 KB)

After - 11 MB.pdf (10.4 MB)

I’m uisng an M1 MacBook Air, macOS Monterey v 12.6.

OCR preferences in DT below:

I tried setting PDF Resolution to 200 dpi, instead of “As source” but it still came out at 11 MB.

The original PDF already had OCR but since the PDFKit font issues, I re-do the OCR in DT (and haven’t had a corrupt PDF since).

I recall file size increases on OCR being raised in this forum a couple of years ago as an issue.

I thought the increase in file size might be ABBYY doing something strange but as a comparison I exported the original (287 KB) pdf to TIFF using Acrobat DC, recombined the TIFF pages (again using Acrobat DC), which produced an 11.7 MB file. Applying OCR in Acrobat took it up to 11.8 MB - roughly the same (a bit bigger, in fact) as ABBYY.

So I’m guessing it might be a function of ABBYY converting to an image? That could be a good thing in that it takes off the existing PDF layers which can cause the corrupt OCR font issue in PDFKit.

If the extra MB is the price of not having unreadable OCR fonts because of the PDFKit issue, for me the extra memory use is a price worth paying. But if you know of anything obvious that I’m doing wrong, if you could let me know.

It’s not a big or urgent issue, though.

Thanks very much, as always.