Can copy or paste a PDF, but cam't search it

jerwin · December 30, 2022, 1:12am

I’ve recently encountered some pdf documents which preview can search but Devonthink seems not to be able to. Pdfolylinn can’t extract the text layer either.

If I am on a page where a word occurs, the word can be searched, but only on that page.

My approach was to re ocr the pdfs, but it seems a waste of time and space.

rmschne · December 30, 2022, 7:17am

That’s not good. But with the info you provide little anyone can do to help, if help sought.

DEVONthink Version? macOS Version? Screen shots? Post a copy of the offending PDF? Knowledge of who created it and with what technology?

jerwin · December 30, 2022, 7:05pm

DT 3.87

So, If I open the document and type meadows in the search box, nothing shows up.
If I scroll the document to page 5, 8 examples show up.

If I search for the same “meadows”, using preview, that program very slowly finds hundreds of matches. It’s almost as if the document is hierarchical , and each page is searched individually

(The reOCRed version shows 138 instances, but there’s always the chance of errors)

(After doing OCR)

BLUEFROG · December 30, 2022, 8:22pm

There are no words detected in the file and shown by a lack of word count and nothing in the Concordance inspector for Pro/Server users. It’s also marked as a PDF Document, not PDF+Text.

There is also something unusual about the file in that in-document conversion to a paginated PDF has no effect.

jerwin · December 30, 2022, 8:55pm

So, if devonthink detects it as a pdf with no words, why does “find” work on a page per page basis? I really hope that the USGovernment Publishing Office doesn’t adopt this scheme as standard.

I am using Ventura, by the way, which does have some OCR capabilities.

BLUEFROG · December 30, 2022, 9:18pm

No idea what’s specifically going on with this file. It’s the first instance I recall ever seeing.

PS: Ventura doesn’t have full OCR capabilities as the content is not stored in a text layer of the file. It’s smoke and mirrors for impromptu use. Sometimes useful? Sure. Professionally useful? I wouldn’t suggest it.