So I’m adding PDFs to DevonThink, and usually don’t need to scan them to highlite and otherwise select the text.
However, I’ve had trouble with a few PDFs lately. When added to DevonThink, they are already classified as “PDF + Text” files, and yet, the only part of the article that acts like it has a text layer is the front page of the PDF, with the electronic databases watermark and information. Everything else behaves like a regular PDF image.
Some journals and news archives supply PDFs like that, with only a header section as PDF+Text.
You can try a conversion to searchable PDF (Data > Convert > to searchable PDF). The results may vary depending on the resolution and quality of the PDF image. I’ve had a couple like that and got reasonably good OCR accuracy.
The OCR Activity window does not open if it was previously closed while it was displaying the status of an active scan. To reset the normal behavior of that window opening automatically when you do a scan you need to start an OCR scan, select Window > OCR Activity, let the scan finish, then close the activity window. (The behavior of the OCR Activity window is not the same as the Log window which always opens when DT has something to report in the log.)
I’ve tried again, and both the OCR and general Activity logs show no activity at all. With the PDF displayed in the main display area, clicking on “Covert to Searchable PDF” produces no reaction at all. Is it because DT already recognizes it as a “coverted” PDF, even though the whole of the contents are not selectable/hilightable/etc?
Is there some kind of general OCR problem with the current version of DTPO (2.0pb7)? I am also having trouble OCR-ing PDFs. I click on Convert>to Searchable PDF and nothing happens. When I open the OCR Activity window, it’s empty and there is no activity. Any ideas what might be going on?
As usual the best thing is to send this document to support@devon-technologies.com with a reference to this thread and then we can check it here. Thanks!