How to tell if OCR'd?

pgseye · October 26, 2013, 8:46pm

Hi,

I have a folder of a few hundred large documents that I want to index in DTP in order to perform searches. However, I don’t know if all of them have been OCR’d. Is there a way for DTP to check and then automatically OCR a document as it’s indexed?, or at least tell you whether a group contains documents that haven’t had text recognition performed on them?

Thanks.

Bill_DeVille · October 26, 2013, 9:10pm

Select the group in DEVONthink. If not present, add the Kind column to the view window (View > Columns > Kind).

PDFs that contain searchable text have Kind = PDF+Text. PDFs that do not contain searchable text have Kind = PDF. The latter are candidates for OCR (although a PDF that doesn’t contain images of printed text won’t produce searchable text after OCR, e.g., a PDF image of a photo or of handwritten text).

If the Indexed group contains PDFs that require OCR, select those and choose Data > Convert > to searchable PDF. A new searchable copy of the PDF will be created and stored within the database. If DEVONthink Pro Office Preferences > OCR has the option to move the original to the Trash after OCR, the original PDF file will be deleted from the external folder.

If you then wish to send the searchable PDF to the external folder and re-Index it to DEVONthink, select it and choose the contextual menu option, Move to External Folder. Note: the selected PDF must be located in an Indexed group for this command to work.

pgseye · October 27, 2013, 5:36am

Many thanks Bill - works a treat.

Regards,

Paul