There appears to be a problem with OCR on import via scanning in DT3 version 3.5.1 running macOS 10.15.6.
I scanned in a few pages of text documents, to make a PDF, as I’ve done hundreds of times before; selected OCR of course, and the PDF was made (and I noticed “recognising”, etc on the activity thing at bottom left), and the file got listed as “PDF+Text”.
I thought all was well.
But when I tried to select any text in the PDF document, I couldn’t. Often it would just select/highlight the entire page - as if the OCR had actually failed, despite the file being listed as “PDF+Text”.
In the end I rescued the trashed jpg files, opened one in Preview, and added more of them one by one to Preview’s ‘thumbnails’ sidebar, “printed” that as a PDF to the desktop, then imported that printed file to DT3, and performed the “Convert > to searchable PDF” on it, which did result in a properly OCRd file.
But it won’t work from the import/scan process.
The ABBYY FineReader OCR “extra” is installed, btw.
I just restarted DT3 and tried again, with the same results.
As a rider to my remarks, I notice that, curiously, searching for some text (that you know is there) in one of these “text not selectable” apparently OCRd PDFs does find the text on the page. It’s just not selectable (and the found text isn’t highlighted, as well as being not selectable).
Aha! Well I got the warning dialog, "Are you sure you want to convert this searchable PDF again … ", but I clicked the ‘Convert’ button anyway, and on the newly created file, the text is selectable.
Hooray. But boo also, as I don’t really want to do it twice on every scan. But thanks for the suggestion as a workaround. It seems to work.
Trying that “re-converting” workaround again using 300dpi (I usually use 150), I thought that the app had hanged (hung?) on the process. It took a long time - 10 minutes or more - and the DTOCRHelper process swallowed 15GB of RAM. (I only tried 300dpi thinking it might help. It didn’t.)
I’ve encountered the same issue with a ScanSnap ix500, MacOS 10.15.6, and DEVONthink 3.5.1. Until the maintenance release is available, I been scanning to disk with ScanSnap Manager doing the OCR, then importing into DT.
Yes, I’ve had to rely on other software as well, as the “converting twice” method can be hit and miss. Hardly “no problem” really. I’ll be glad when I can scan documents straight into DT3 again, using DT3.
The problem is still not fixed in the latest 3.5.2 update !
The ABBYY download happened, but scanned documents, OCRd, produce no selectable text!
Furthermore, as has happened before, if I opt for the scan to go to a “new binder”, the scan process completely forgets about that as soon as I click scan.
Well, it isn’t for me. Yes, import sidebar, OCR enabled - of course. Considering that Finereader successfully OCRs documents, there’s no reason to think that it’s anything other than DT3 at fault (and especially since we saw this identically behaving bug in the previous release version).
Looking around similar threads I notice that re-installing the ABBYY DTOCRHelper application seems to help. I find I have an older version 1.1.2 (as opposed to the version 1.1.13 installed today with the DT3 3.5.2 update). Should I try replacing the newer version with the old? It seems a bit of a kludge.