Making PDF's searchable after they are scanned

HenkvanEss · April 5, 2013, 9:12am

Hi

Scansnap does excellent text recognition, however, it takes too long if you have a lot of documents. Is there a way that I scan files into PDF and pdf+ (Searchable text) them afterwards?

I managed to get a list of all non searchable PDF’s iN Devon Think, but if I try Data->Convert I can’t find a PDF+ button or Make Searchable

What am I doing wrong?

korm · April 5, 2013, 10:26am

What edition of DEVONthink do you use? Conversion to searchable PDF is a feature of only DEVONTHINK Pro Office.

HenkvanEss · April 5, 2013, 10:45am

That bugs me even more, got the PRO (office)

korm · April 5, 2013, 11:12am

The forum’s not going to be able to help diagnose a problem with specific documents … you’re better off sending a trouble report to support-at-devontechnologies.com or here and including samples of the PDFs that DEVONthink is not recognizing as convertible. That way the tech staff can see the actual issue and deal with it.

HenkvanEss · April 5, 2013, 11:16am

The question for now is: is it in Office Pro? Should it be in Data->Convert?

korm · April 5, 2013, 11:37am

In DEVONthink Pro Office (only) when a PDF that can be OCRd is selected, then Data > Convert and the contextual menu should show:

Allsop · April 5, 2013, 12:48pm

Having recently upgraded to DTP Office partly for this very facility to convert PDFs to searchable PDFs I am very interested in this. Four questions:

Do I have to go through all of my PDFs and convert them one by one?
If so can I do this from the ‘All PDF Documents’ Group?
How do I know which of the resulting PDFs is the one that has been converted to searchable?
Can I then delete the original, non-searchable PDF and if so will deleting it from the ‘All PDF Documents’ Group delete it from its Group?

Thanks for your patients & help.
Andrew

Allsop · April 5, 2013, 12:52pm

“…patients” ??? Sorry meant patience!!!

korm · April 5, 2013, 1:23pm

If a PDF has been OCR’d previously, it will show up as “PDT+Text” and no further action is needed. (In fact, converting/OCRing an already-OCRd PDF makes things worse, not better)

Yes, or you can select a group of PDFs and have them all converted in a batch (one-by-one, not simultaneously) I keep a Smart Group defined as follows in my databases so I know at a glance what’s not been converted:

In the Kind column of document displays it says “PDF+Text”. So does Tools > Show Info

Sure. You can delete the original automatically if you go to DEVONthink > Preferences > OCR and select “Original Document: Move to Trash”

See DEVONthink Help for further instructions on the preference settings…

Allsop · April 5, 2013, 2:00pm

Thanks Korm as usual.