How to detect pdfs without text layer in Finder (perhaps OT)

Prion · August 23, 2008, 3:54pm

Hi all,

I am unsure if this is on topic or not but here goes: I have lots of pdfs that I collect in a dedicated folder in the Finder. Depending on whether or not this pdf has a text layer or not, I will proceed differently: import into reference managing program (sente) directly or OCR first. Importing pdfs without text layer is possible, however, my workflow heavily depends on my references being searchable, hence I would like to avoid importing pdfs without text layer.

Can any of you think of a way to do automate this decision and perhaps have pdfs with/without text layer into different subfolders in Finder? I have tried but unsuccessfully so far.

Many thanks
P

PS: Eventually, the renamed pdfs for which I have al the bibiographical information will be indexed by Devonthink Pro which also knows whether the pdf has a text layer. For various reasons I would like to tackle this problem earlier in my workflow, namely, prior to indexing by DTP.

cgrunenberg · August 28, 2008, 6:09am

The only ideas coming to my mind were Preview (not scriptable) and Image Events scripting (doesn’t return the desired property). Therefore it’s probably impossible without third-party solutions.

Prion · August 28, 2008, 2:01pm

Christian
thanks for your judgment albeit pessimistic. At least I haven’t overlooked something glaringly obvious.

Can you think of any solution (Apple script, third party program etc) that might return the value I am interested in? The problem is very real for me and a solution might save me a lot of tedious and error prone work.

Prion

PS: How does DTP get the info then? It can apparently tell “PDF” from “PDF+Text”

cgrunenberg · August 28, 2008, 2:12pm

One solution is to let DEVONthink Pro index the file. Then check if the “word count” property is >0 and finally delete the record again.