I see that scanned pdf’s that have been OCR’d are listed as “pdf+text” in DT (also called hidden text elsewhere). But when I create a smart folder, I only have the choice of File Type = pdf, ps.
Is there some way to use DT to “Find all pdfs in a folder that do not have hidden text”? I’d like to do this, so I can OCR them, without being distracted by the 1000s of files that I have which already have been OCR’d.
Add a ‘Kind’ column to your view window (View > Columns > Kind). Now you can sort the contents of a group by Kind and distinguish Kind = PDF (not searchable) from Kind = PDF+Text (searchable).
Then select one or more image-only PDFs and choose Data > Convert > to searchable PDF.
DEVONthink 2 includes a template for such a smart group, see Data > New from Template > Smart Groups > PDFs (not searchable).
Well now! Just the question I came seeking an answer for. Thanks.
But somethings “troubles” me about this.
Just as I went looking to see if there was a solution (a way to specify the particular “kind” (i.e. differentiate between “PDF” and “PDF + text” in the search criteria) I considered, if there wasn’t, just adding a “0 words” criterion to the list to produce the desired outcome.
And so when I saw there was a “special” template included by the developer - I was glad - but then I happened to look at the actual coding - and low and behold it was simply to add to the PDF the “0 words” criterion and thus producing what I figured was a work-around.
It just seems counterintuitive that this should be necessary. (But maybe there is something that the designers/developers understood that I didn’t)
But it seems that the more “appropriate” “fix” would be to allow one to specify in the “kind” selector something that is not already listed (by entering, in this case, the “+ text” as a negative condition).
I am not sure if there are other “kind” type searches that might fall into this category (where there is a “kind” that is generated in the tags by the program - but that is not “listed” in the list in terms of creating a smart group).