I’ve found a few files in my database with non-OCR’d (and therefore non-searchable) PDF’s, mostly things that were emailed to me and I did not check at the time. I’d like to find all these, and OCR them now.
I found this nice (old) Tuesday tip: blog.devontechnologies.com/2007/ … ocr-layer/
But the script it references is not found (404): devon-technologies.com/files … t_Text.zip
Does anyone have a copy of this script, or something similar?
I thought I could just use search or advanced search, but while I can do a search on Document Kind, the available choices cannot differentiate “PDF” vs “PDF + Text”.
Can any of you gurus help me out here? Thanks a lot in advance!
Make a Smart Group with criteria:
Kind is PDF/PS
Word Count is 0
See menu Data > New with Template > Smart Groups > PDFs (not searchable)
Thanks a lot Jim, Christian! So much depth in DevonThink, I love it.
(I’d started thinking of checking the output of pdftotext for each file, and scripting a walk thru the entire database. So much easier this way! Now I have the smart group, so it’s trivial to see if/when new files show up without text layer)
Now that I see the correct phrase to Google for, I find this nice post from Evan K. on the complete process: 40tech.com/2015/08/17/the-be … think-ocr/
No problem - and yeah, DEVONthink has a lot of things to discover!
(PS:Evan has some really practical info concerning DEVONthink on his blog. A good find! )