This is not specifically a DevonThink issue but DevonThink is equally affected.
I have access to electronic publications published by a legal publisher. It is possible to download extracts of the books as PDFs.
However, using any PDF viewing app on the Mac (including DevonThink, such as Adobe Acrobat, Foxit, PDFpen Pro, etc.) copying text from the resulting PDF produces garbage. Also it is not possible to search for text (as the search only sees garbage).
For example:
if the text selected and copied is as follows:
āThe objective test In deciding whether the parties have reached agreement, the courts normally apply the objective test,6 which is further discussed at para.2-003 below. Under this test, once the parties have to all outward appearances agreed in the same terms on the same subject-matter,7 then neither can, generally,8 rely on some unexpressed qualiļ¬cation or reservation to show that he had not in fact agreed to the terms to which he had appeared to agree. Such subjective reservations of one party therefore do not prevent the formation of a contract.9ā
However, exporting the PDF to Word (using PDFPen Pro) does not produce readable text.
But opening the same PDF in a PDF viewing app for Microsoft Windows does allow me to copy the selected passage as text. (For this I used PDFX-Change running under CrossOver).
I clicked on the link and the file downloaded and showed just fine in Preview. I āmovedā it to the Global Inbox and DEVONthink imported it as a PDF (no ocr). When viewed in DEVONthink, white space.
I āmovedā from Preview to the desktop, then dragged and dropped into a DEVONthink folder and itās viewable in DEVONthink. No OCR.
Copy/pasting from Preview and and DEVONthink gives gobblygook characters in Word.
With DEVONthink, I OCRāed the version that looked ok, and after that step, the content could copy/paste into Word just fine. Also could paste in to my text Editor (BBEdit) as plain text.
I then OCRāed the version that was blank white space, and that no change to viewing. Still just white when viewed in DEVONthink or any PDF viewer I tried (PDFPen and Preview). OCR did not do anything.
On the downloaded version stored on Desktop (which I previously dragged into DEVONthink), I opened with PDFPen. Tried to āclear OCR Layer on pageā, and nothing happened. That was unexpected as I was then going to re-OCR it with that tool, but as there apparently is an OCR layer there could not do that.
Not sure what to make of all that, frankly. Iāve forgotten more than I knew about the vagaries of PDFs.
I think itās just a āfunnyā PDF of some sort.
Itās so good I wish I coded it.
I still use TextEdit for some support stuff, but I use CotEditor most often.
BBEdit too, but I never compose in it. I use it as a can opener, e.g., for inspecting the raw code of PDFs, etc.
PS: However, for composition in Markdown, etc., DEVONthink is still my default.
Since Iām still on Ventura, I missed that it had 5.0 release (min. requirement is Sonoma) with a new sidebar for Folder Navigation, among other things. I donāt think itās enough by itself to make me upgrade, but it does sway the needleā¦
(Edit: sorry for reviving the thread. I saw some activity and thought this was a recent reply.)
My guess would be DRM on the publisherās side. Certainly if you see the same behavior in every viewer, the publisher would be the source of the problem.