OCR fails from scan import in Catalina

avatar · July 28, 2020, 5:18pm

As I mentioned, the PDF is indeed listed as “PDF+Text”.

The weird bit is that I can find text on the page, but just no highlighting when found, and more importantly, no selecting!

This is quite important for me as I regularly scan in lots of printed documents for OCRd PDFs.

BLUEFROG · July 28, 2020, 5:24pm

Ahh… Sorry, I missed that.

I’m curious: If you run OCR on the file again, does it behave the same or differently?

avatar · July 28, 2020, 5:30pm

Aha! Well I got the warning dialog, "Are you sure you want to convert this searchable PDF again … ", but I clicked the ‘Convert’ button anyway, and on the newly created file, the text is selectable.

Hooray. But boo also, as I don’t really want to do it twice on every scan. But thanks for the suggestion as a workaround. It seems to work.

avatar · July 28, 2020, 5:43pm

Trying that “re-converting” workaround again using 300dpi (I usually use 150), I thought that the app had hanged (hung?) on the process. It took a long time - 10 minutes or more - and the DTOCRHelper process swallowed 15GB of RAM. (I only tried 300dpi thinking it might help. It didn’t.)

BLUEFROG · July 28, 2020, 6:06pm

This should be resolved in the next maintenance release.

avatar · July 28, 2020, 6:14pm

Great. I’ll look forward to that fix. Thanks.

BLUEFROG · July 28, 2020, 6:20pm

No problem.

amalis · August 5, 2020, 3:44pm

I’ve encountered the same issue with a ScanSnap ix500, MacOS 10.15.6, and DEVONthink 3.5.1. Until the maintenance release is available, I been scanning to disk with ScanSnap Manager doing the OCR, then importing into DT.

avatar · August 10, 2020, 7:22am

Yes, I’ve had to rely on other software as well, as the “converting twice” method can be hit and miss. Hardly “no problem” really. I’ll be glad when I can scan documents straight into DT3 again, using DT3.

avatar · August 13, 2020, 3:20pm

The problem is still not fixed in the latest 3.5.2 update !

The ABBYY download happened, but scanned documents, OCRd, produce no selectable text!

Furthermore, as has happened before, if I opt for the scan to go to a “new binder”, the scan process completely forgets about that as soon as I click scan.

BLUEFROG · August 13, 2020, 3:48pm

What kind of scanner are you using?

avatar · August 13, 2020, 3:51pm

Epson Perfection 4990.

FineReader has no problems OCR-ing it, but with DT3 I always have to scan twice!

(that is, not “scan twice”, but after the unsuccessful OCR, go for the OCR > to searchable PDF)

BLUEFROG · August 13, 2020, 5:31pm

Hmm… I’m doing a scan with an HP OfficeJet 9010 in DEVONthink’s Import sidebar, with OCR enabled.

The text is fully selectable in the finished file.

Outside of manually reinstalling the OCR components, @aedwards would have to comment on this further.

avatar · August 13, 2020, 5:52pm

Well, it isn’t for me. Yes, import sidebar, OCR enabled - of course. Considering that Finereader successfully OCRs documents, there’s no reason to think that it’s anything other than DT3 at fault (and especially since we saw this identically behaving bug in the previous release version).

avatar · August 13, 2020, 8:20pm

Looking around similar threads I notice that re-installing the ABBYY DTOCRHelper application seems to help. I find I have an older version 1.1.2 (as opposed to the version 1.1.13 installed today with the DT3 3.5.2 update). Should I try replacing the newer version with the old? It seems a bit of a kludge.

BLUEFROG · August 13, 2020, 8:23pm

As I said previously,

Outside of manually reinstalling the OCR components, @aedwards would have to comment on this further.

avatar · August 13, 2020, 8:28pm

Uh … OK …

200

BLUEFROG · August 13, 2020, 8:47pm

Hahaha! I don’t understand your message.

avatar · August 13, 2020, 8:49pm

I’m waiting with trepidation for the pronouncement from @aedwards

BLUEFROG · August 13, 2020, 8:54pm

If it’s working with the version you have currently installed, you can proceed until he has a chance to chime in.