Further degradation of OCR PDF image layer in 3.0.2

Pete248 · December 30, 2019, 7:10pm

I still see some degradation in image quality in 3.0.3, though it is much better than in 3.0.2.

I meanwhile used PFDPen Pro for OCR but now switched back to DT3/Finereader as the OCR is better with Finereader.

That said I miss the speed of OCR in PFDPen Pro which uses all 6+6 CPU cores in my cMP while Finereader uses only one core. OCRing a 30 pages document in PFDPen Pro is done in a few seconds. Would be nice if Finereader would use more cores for multipage documents in the future.

rholmes777 · January 8, 2020, 9:01pm

I just noticed this problem in DTPO 3.0.3 (not sure if I did conversions and didn’t check the output or not). I ended up with double the size and at least 50% depredation in quality (to the point that the finished pdf is unusable).

I did check that the helper app was version 1.0.25, and it actually seems worse than prior versions of 3.x DTPO. Unfortunately I don’t have other software with which to OCR, so I’m very anxious for a fix and will not be able to OCR until this is resolved.

Any updates on when this will occur? Thanks!

Screenshots attached.

bcarpenter · March 21, 2020, 2:26am

I’m not sure which of the many threads about this problem I should post in. Image degradation has been a problem with ABBYY & DT since at least 10 years ago, and just about every other PDF compressor that seems to use the built in MacOS PDF tools. While it might work okay for greyscale & colour images, any 1-bit image is converted to a greyscale image, then compressed, causing the file size to increase & the image quality to degrade.

I’ve just tested DT 3.0.4 & while the 1-bit image retains sufficient quality, the single-page file size blew out from 217kB to 690kB. Using Acrobat Pro, I get these files sizes down to about 50kB (including OCR).

PDFPen seems to retain image quality, but doesn’t reduce file size well. In fact, I have found nothing except Adobe Acrobat Pro works for me. I want to OCR, then reduce files sizes with 1-bit pages becoming 300dpi & all other-bit pages becoming 150dpi, which compressing the page image. And batch process this for a folder.

chreliot · March 21, 2020, 6:55pm

Does anyone have experience comparing ABBYY’s standalone offering with any of these others?

@bcarpenter’s experience seems to be that DT 3.0.4 improves image quality, though at a cost of large file size. (For me, for research articles rather than receipts and similar, that’s a happy trade-off.) And PDFPen does the same. Acrobat Pro has smaller file-size, but I don’t think its OCR is as accurate as ABBYY’s. Adding DT’s option to check “Compress PDFs” to the mix then leaves us with a good range of trade-offs, variously optimizing for two of file-size, excellent OCR, and image-quality, even if no option yields all three.

dcg2308 · September 9, 2020, 1:38am

I have long used the built in OCR and never noticed image degradation as bad as it is at the moment. See the example below which takes it from readable to almost not readable. Is there a recommended solution within Devonthink?

aedwards · September 9, 2020, 8:29am

What settings are you using for the OCR? Turning off the auto correction of deskew improves the output, see image below.

ABBYY would generate the output image after its preprocessing so the blurring could be caused by a very small deskew correction.

dcg2308 · September 9, 2020, 10:25am

Thank you for this suggestion – I wasn’t aware of that setting. So I did go in and turn off deskewing and re OCR’d the document, but again unfortunately the result is not good.

Maybe there’s something else I’m doing wrong – as I say I’ve used the built-in OCR for years without problems. I’m aware the quality of what I’m trying to OCR is not great (I think it’s a scan of a microfilm) so I certainly don’t expect the OCR to be perfect – I’m just unhappy that the OCR now blurs the document making it difficult or at least tiring to read.

aedwards · September 9, 2020, 2:01pm

Can you take a screen shot of your OCR preferences and I will see what is different from the settings I am using.

dcg2308 · September 9, 2020, 10:05pm

Here they are – I also tried turning off ‘page orientation’ but results were about the same.

aedwards · September 17, 2020, 2:32pm

We released an update to the OCR this week, did that help? If not try increasing the dpi to say 200.

cblaha · September 17, 2020, 3:32pm

Where do I see the OCR-version in 3.5.2?

ok, I found it:
/Abbyy/DTOCRHelper.app => v1.1.14

BLUEFROG · September 17, 2020, 3:37pm

1.1.14 is the updated version.

aedwards · September 17, 2020, 3:47pm

Details of the OCR update and how to install can be found here:

Blanc · September 18, 2020, 8:21am

If I follow those instructions, ABBYY FineReader OCR remains grey with no possibility to select it, despite the fact that I have DTOCRHelper.app v 1.1.13. Any tips?

aedwards · September 18, 2020, 8:48am

Try restarting DEVONthink the OCR version file is downloaded on startup.

Blanc · September 18, 2020, 3:24pm

That worked, cheers

dcg2308 · September 20, 2020, 10:36pm

Thank you! OK I loaded in the new version of OCR and tried the same document again. Here are the results.
This is document before OCR:

This is the document OCR’d set to 200 dpi: In this case none of the text was selectable.
This is the document OCR’d at 300 dpi. In this case the text is selectable only as shown:
Again I understand I am starting with a not-great quality document. I do feel the OCR results used to be better than this but can’t demonstrate that.

aedwards · September 21, 2020, 8:11am

I haven’t been able to reproduce the same results, is it possible to send a copy of the original document. I have send you a message with the email address to send it too.