I’ve recently encountered some pdf documents which preview can search but Devonthink seems not to be able to. Pdfolylinn can’t extract the text layer either.
If I am on a page where a word occurs, the word can be searched, but only on that page.
My approach was to re ocr the pdfs, but it seems a waste of time and space.
That’s not good. But with the info you provide little anyone can do to help, if help sought.
DEVONthink Version? macOS Version? Screen shots? Post a copy of the offending PDF? Knowledge of who created it and with what technology?
So, If I open the document and type meadows in the search box, nothing shows up.
If I scroll the document to page 5, 8 examples show up.
If I search for the same “meadows”, using preview, that program very slowly finds hundreds of matches. It’s almost as if the document is hierarchical , and each page is searched individually
(The reOCRed version shows 138 instances, but there’s always the chance of errors)
(After doing OCR)
There are no words detected in the file and shown by a lack of word count and nothing in the Concordance inspector for Pro/Server users. It’s also marked as a PDF Document, not PDF+Text.
There is also something unusual about the file in that in-document conversion to a paginated PDF has no effect.
So, if devonthink detects it as a pdf with no words, why does “find” work on a page per page basis? I really hope that the USGovernment Publishing Office doesn’t adopt this scheme as standard.
I am using Ventura, by the way, which does have some OCR capabilities.
No idea what’s specifically going on with this file. It’s the first instance I recall ever seeing.
PS: Ventura doesn’t have full OCR capabilities as the content is not stored in a text layer of the file. It’s smoke and mirrors for impromptu use. Sometimes useful? Sure. Professionally useful? I wouldn’t suggest it.