I’ve found a few files in my database with non-OCR’d (and therefore non-searchable) PDF’s, mostly things that were emailed to me and I did not check at the time. I’d like to find all these, and OCR them now.
Does anyone have a copy of this script, or something similar?
I thought I could just use search or advanced search, but while I can do a search on Document Kind, the available choices cannot differentiate “PDF” vs “PDF + Text”.
Can any of you gurus help me out here? Thanks a lot in advance!
Thanks a lot Jim, Christian! So much depth in DevonThink, I love it.
(I’d started thinking of checking the output of pdftotext for each file, and scripting a walk thru the entire database. So much easier this way! Now I have the smart group, so it’s trivial to see if/when new files show up without text layer)