automatically find non OCR'ed duplicates


I’ve converted hundreds of PDF files into OCR’ed PDF files. But now I have both, the non-OCR’ed files and the OCR’ed files. Is there a way to find all the non-OCR’ed files with OCR’ed duplicates to delete them?

Thanks in advance for your support!

Kind regards, Friedrich

It’s possible to find all not yet OCRed PDF documents e.g. via Data > New from template > Smart Groups > PDFs (not searchable). However, this might include additional PDF documents without a duplicate.

Hmm. So first I have to OCR all of my PDF files (appr. 12’000), and then I can delete all of the non-searchable PDF files, haven’t I?

Kind regards, Friedrich

You could enable the option to move OCRed documents to the trash, see Preferences > OCR.

Thank you for this help! I’ve been trying to figure out a way to do this, and you showed me a very easy way to do it. :smiley:


Yeah, Smart Groups are very cool! :smiley: