Importing scanned images, pdfs, etc and OCR

ShaunGreen · October 7, 2014, 11:59am

I scanned a document then drag/dropped it into DTP which produced an error message in the log saying “no text” which makes me believe OCR does not work when doing this? I tried selecting import images with OCR (this example was a scanned document from a standard flat bed scanner) which produced no error report.

So now I am curious about using drag/drop and if OCR works when doing this. Does anyone know and if it does not work, if so is it planned for implementation via the next DTP update? Also, if I have existing documents in DTP is there a way to select scan database and convert existing documents to OCR?

Shaun

BLUEFROG · October 7, 2014, 12:45pm

No OCR does not automatically happen just by dropping a file into your database. You can right-click (Control-click) a file and choose “Convert > To Searchable PDF”.

As far as a “scan” of the database for conversion, it would be possible to write a script that could do this but I would be cautious about any process that could potentially affect so many files in an unattended way. Potentially as a trigger on a Smart Group of PDFs with a Word count of zero,…

Greg_Jones · October 7, 2014, 12:54pm

First, it might be helpful to others to use the acronym DTP to identify DEVONthink Pro (no OCR capability) and DTPO to identify DEVONthink Pro Office, which does have OCR capability.

Drag and drop into DTPO does not automatically convert the document to PDF+Text here, and I do not believe there is a keystroke modifier option when dragging to change the default import behavior. If this is something to be considered in a future update, I hope it will not be the default behavior. I don’t want all my plain PDFs to be converted to PDF+Text. Perhaps adding a keystroke modifier is something that is possible?

Set up a smart group where Kind>Is>PDF/PS, then sort the results by Kind. Sorting this way will order PDFs and the PDF+Text (depending on ascending/descending sort order). Select the desired PDFs, right-click, and select Convert to Searchable PDF. The conversion process takes time, RAM, and puts a load on the CPU. If you have a large number of PDFs to convert, you might want to convert them in smaller batches.

ShaunGreen · October 7, 2014, 3:11pm

Thanks guys.