How to detect pdfs without text layer in Finder (perhaps OT)

Hi all,

I am unsure if this is on topic or not but here goes: I have lots of pdfs that I collect in a dedicated folder in the Finder. Depending on whether or not this pdf has a text layer or not, I will proceed differently: import into reference managing program (sente) directly or OCR first. Importing pdfs without text layer is possible, however, my workflow heavily depends on my references being searchable, hence I would like to avoid importing pdfs without text layer.

Can any of you think of a way to do automate this decision and perhaps have pdfs with/without text layer into different subfolders in Finder? I have tried but unsuccessfully so far.

Many thanks

PS: Eventually, the renamed pdfs for which I have al the bibiographical information will be indexed by Devonthink Pro which also knows whether the pdf has a text layer. For various reasons I would like to tackle this problem earlier in my workflow, namely, prior to indexing by DTP.

The only ideas coming to my mind were Preview (not scriptable) and Image Events scripting (doesn’t return the desired property). Therefore it’s probably impossible without third-party solutions.

thanks for your judgment albeit pessimistic. At least I haven’t overlooked something glaringly obvious.

Can you think of any solution (Apple script, third party program etc) that might return the value I am interested in? The problem is very real for me and a solution might save me a lot of tedious and error prone work.


PS: How does DTP get the info then? It can apparently tell “PDF” from “PDF+Text”

One solution is to let DEVONthink Pro index the file. Then check if the “word count” property is >0 and finally delete the record again.