automatically find non OCR'ed duplicates

vatolin · June 5, 2014, 8:02am

Morning.

I’ve converted hundreds of PDF files into OCR’ed PDF files. But now I have both, the non-OCR’ed files and the OCR’ed files. Is there a way to find all the non-OCR’ed files with OCR’ed duplicates to delete them?

Thanks in advance for your support!

Kind regards, Friedrich

cgrunenberg · June 5, 2014, 9:06am

It’s possible to find all not yet OCRed PDF documents e.g. via Data > New from template > Smart Groups > PDFs (not searchable). However, this might include additional PDF documents without a duplicate.

vatolin · June 5, 2014, 9:12am

Hmm. So first I have to OCR all of my PDF files (appr. 12’000), and then I can delete all of the non-searchable PDF files, haven’t I?

Kind regards, Friedrich

cgrunenberg · June 5, 2014, 10:02am

You could enable the option to move OCRed documents to the trash, see Preferences > OCR.

Doctor_Dave · April 26, 2016, 5:27am

Thank you for this help! I’ve been trying to figure out a way to do this, and you showed me a very easy way to do it.

Dave

BLUEFROG · April 26, 2016, 8:30pm

Yeah, Smart Groups are very cool!