Duplicates created while Creating Searchable PDF

Damager · June 25, 2012, 9:31pm

I just purchase DevonThink Pro Office, and imported a number of PDF documents, which are not text searchable. When I ask DevonThink to convert them to a searchable PDF (which seems to work fine), it looks like DeonThink is saving the original and creating a NEW document with the searchable version, i.e. I now have 2 of every document when I look at the database.

Is this normal?
How to I delete the dups, i.e. just the docs with no searchable text?

Greg_Jones · June 25, 2012, 9:44pm

Check ‘Original Document: Move to Trash’ in the OCR prefs.

Damager · June 25, 2012, 11:51pm

Thanks Greg - that did the trick for the bunch that I haven’t converted yet. Is there an easy way to identify the dups I’ve already created?

korm · June 26, 2012, 3:11am

If you make a smart group that looks like this

You’ll find all non-OCRd PDFs. Since that might find documents that you don’t want to delete, then don’t use the “Word Count is” predicate and the smart group will find all PDFs. Sort the smart group results by name and the PDF and PDF+Text pairs will line up next to one another.