Stubborn OCR

So I’m adding PDFs to DevonThink, and usually don’t need to scan them to highlite and otherwise select the text.

However, I’ve had trouble with a few PDFs lately. When added to DevonThink, they are already classified as “PDF + Text” files, and yet, the only part of the article that acts like it has a text layer is the front page of the PDF, with the electronic databases watermark and information. Everything else behaves like a regular PDF image.

Some journals and news archives supply PDFs like that, with only a header section as PDF+Text.

You can try a conversion to searchable PDF (Data > Convert > to searchable PDF). The results may vary depending on the resolution and quality of the PDF image. I’ve had a couple like that and got reasonably good OCR accuracy.

Sorry - I didn’t clarify. It’s been a long day.

When I try to “Convert to Searchable PDF” nothing happens. DT doesn’t seem to respond in the usual way, which is to pop-up the work queue.

Sorry for the stupid question: are you sure nothing happens ?

From time to time DTPO fails to pop up the OCR-activity-window automatically, so I have to show it manually by clicking the according menu-item.

Since the OCR.process can take quite while, in such a case nothing seems to happen.

The OCR Activity window does not open if it was previously closed while it was displaying the status of an active scan. To reset the normal behavior of that window opening automatically when you do a scan you need to start an OCR scan, select Window > OCR Activity, let the scan finish, then close the activity window. (The behavior of the OCR Activity window is not the same as the Log window which always opens when DT has something to report in the log.)

I’ve tried again, and both the OCR and general Activity logs show no activity at all. With the PDF displayed in the main display area, clicking on “Covert to Searchable PDF” produces no reaction at all. Is it because DT already recognizes it as a “coverted” PDF, even though the whole of the contents are not selectable/hilightable/etc?

Please send the PDF as an attachment in a message to Support, mentioning that you are unable to run Data > Convert > to searchable PDF on it.

Is there some kind of general OCR problem with the current version of DTPO (2.0pb7)? I am also having trouble OCR-ing PDFs. I click on Convert>to Searchable PDF and nothing happens. When I open the OCR Activity window, it’s empty and there is no activity. Any ideas what might be going on?

Check the Console (in Applications > Utilities) for any messages. Normally an error should be logged in the Log tool.

My guess is that in the past you indicated that you didn’t want to see the warning for files that have been converted already.

Thanks for the quick reply. I checked the console, and I can’t find an entry.

BTW, the file is not already converted. It’s a Google Book PDF, which is an image file where only the first page is a PDF+Text.

As usual the best thing is to send this document to with a reference to this thread and then we can check it here. Thanks!