I’ve recently encountered some pdf documents which preview can search but Devonthink seems not to be able to. Pdfolylinn can’t extract the text layer either.
If I am on a page where a word occurs, the word can be searched, but only on that page.
My approach was to re ocr the pdfs, but it seems a waste of time and space.
So, If I open the document and type meadows in the search box, nothing shows up.
If I scroll the document to page 5, 8 examples show up.
If I search for the same “meadows”, using preview, that program very slowly finds hundreds of matches. It’s almost as if the document is hierarchical , and each page is searched individually
There are no words detected in the file and shown by a lack of word count and nothing in the Concordance inspector for Pro/Server users. It’s also marked as a PDF Document, not PDF+Text.
There is also something unusual about the file in that in-document conversion to a paginated PDF has no effect.
So, if devonthink detects it as a pdf with no words, why does “find” work on a page per page basis? I really hope that the USGovernment Publishing Office doesn’t adopt this scheme as standard.
I am using Ventura, by the way, which does have some OCR capabilities.
No idea what’s specifically going on with this file. It’s the first instance I recall ever seeing.
PS: Ventura doesn’t have full OCR capabilities as the content is not stored in a text layer of the file. It’s smoke and mirrors for impromptu use. Sometimes useful? Sure. Professionally useful? I wouldn’t suggest it.