I recently purchased a huge library of PDFs of mostly classic French literature from a European publisher (Arvensa Editions). The contents of the files ARE fully searchable in Acrobat Reader (DC or Pro DC) on Windows and on macOS, but are NOT searchable in Preview on macOS. (The beachball spins indefinitely until the search is terminated in Preview. No error message is generated)
Nor are they searchable in DEVONthink 3.0.4: if I drag one of the PDFs into DT, it imports only as “PDF”, not PDF+Text, and apparently none of the content of the PDFs is indexed by DT, so cannot be found in a database-wide search. Can anyone offer advice regarding how I can fix this problem, i.e., make the Acrobat-compatible PDFs compatible with DT (or with Preview, for that matter)?
This should actually be searchable (depending on the structure of the PDF of course). Can you select any text in Preview/DEVONthink of such a document, copy it to the clipboard and paste it into a new TextEdit document?
Text is selectable in Preview/DEVONthink but copying and pasting into TextEdit yields nothing. Copying and pasting from the same file in Acrobat Pro DC and pasting into TextEdit does work.
And have since tried saving these files from Adobe Acrobat in a variety of PDF and Acrobat-compatible versions. All are searchable from Acrobat. All are NOT searchable in Preview or in DT3. All versions of the documents are searchable – even in the original format – in PDF Reader (Readdle’s application); all are NOT searchable in Skim. (These are the other PDF readers I have on my computer.) I would appreciate anyone’s advice re how to strip out whatever is the culprit element responsible for blocking compatibility. I’m going to get no satisfaction from the publisher, who will point out that Acrobat compatibility is all they need guarantee. Since I really need DT3 compatibility to use these texts in my scholarly workflow, they’re of no use to me as is.
Skim uses the PDFKit framework like Preview & DEVONthink. One workaround might be to OCR these documents but depending on the number of pages & documents this might require a lot of time.
The documents are many thousands of pages long. There are dozens of documents. OCR is not an option. I’ve tried to force OCR in both DT3 and FineReader, resulting only in crashes and hangs. So I guess I’m stuck.
If I do the above then the file does show up at PDF+Text in DT3 but searching only finds hits in what appears to be an auto-generated Table of Contents at the end of the document. Andf that’s what happens in Preview AND Acrobat now, as well: in other words, before the PDFX-3 conversion, DT3 and Preview could not search in the PDF, Acrobat could search in all the text. After the conversion, DT3, Preview, and Acrobat can only find auto-generated (?) text in the document. Sheesh.
I also have sometimes this problem. I find out that opening the PDF with Acrobat Reader for Windows and print it to an freeware PDF converter always solves my problems. Perhaps you wanna give it a try…