After OCR I cannot highlight / mark text in the PDF

After OCRing some PDFs, I realised, that I cannot highlight / mark text (e.g. to copy it). But that is only in the DT App. When I open the PDF in PDF Expert, it is no problem. And furthermore it does not affect older OCRed files, but only new ones. Is anyone experiencing similar problems? This happens on my MacBook as well as on my iMac.

Which version of DEVONthink do you use and which languages do the PDFs contain?

3.5 … German

Editing of certain PDF documents is disabled if the PDFKit of macOS might corrupt them. Could you send me a copy of such a file?

I’m experiencing the same issue with certain files (bank statements that are “PDF+text” originally, i.e. I do not OCR them).
No way to select/copy in neither DT3 nor Preview, but no problem in Acrobat Reader.

In this case the documents don’t seem to be fully compatible to macOS’ PDFkit framework.

It’s interesting: If I do a cmd-A, the first part of the document is actually selected – just the sender and receiver information. All account details after remain unselected. No hints in the document information, either: I have the right to copy. But I can’t.

On the other side, searching in the document works in Preview as it does in DT3, but the search result is not highlighted in the PDF.

Oh, and PDFPen is able to select the text and hightlight search results.

And then there’s this interesting quote on stackexchange:

I received feedback from our engineers. They confirmed that there is an issue with the Preview feature in Catalina. They were able to reproduce the issue by testing it on Macs running both Mojave and Catalina and got the same result that you did. This is something that has been reported and will be addressed in a future software update.

In my case, the PDF was fine after I printed it as PDF from preview. Simply “Save as…” didn’t work, because the program refused to save.

The only difference, that I can see between a working and a not working document, is the ABBYY version. Version 12 could be the culprit… perhaps…

The same problem like in DT3 is in the preview app of MacOS. I’ll send you 2 files. A working one and one with the described problem…

I suppose it’s a problem with PDFkit (see above). Try printing (! not “save as…” the file to a new pdf) from preview. Does that help?

1 Like

Yes… that helps. Exporting as document and reimporting also helps…

Do you use an intelligent rule for OCRing the document?

No. And I have the problem you described with external PDFs, i.e. ones that already contain text.

I’m experiencing the same problem on all my OCR imports - even the ones where the text is very clear and legible. PDF text IS selectable if the document is opened in PDF Expert (though not in the Mac OS Preview app). Only remedy I’ve found is to right click in DT3.5 and “Convert to Searchable PDF” a second time.

A screenshot of Preferences > OCR plus a document before importing would be great, thanks.

As I’ve just said to you in a PM — but this may be of use to others — if “Enter metadata after text recognition” is ticked in Preferences this will fix the problem. But obviously it means dealing with a dialogue box which may not be optimal.

1 Like

I just applied an OCR rule (from OCR and DEVONThink To Go) to both PDFs and images from Evernote Scannable app, and the resulting PDF+text files have searchable text but I also cannot highlight/copy this text in DT or Preview (but text is selectable in PDF Expert).

Here are my OCR prefs:

I will attach both image and PDF pre and post processing by DT OCR as well. Scannable Document on Jun 17, 2020 at 9_41_49 AM.pdf (692.2 KB)

Post processing: Post OCR Scannable Document on Jun 17, 2020 at 9_41_40 AM.pdf (352.0 KB) Post OCR Scannable Document on Jun 17, 2020 at 9_41_40 AM.pdf (352.0 KB) Post OCR of pdf file.pdf (333.4 KB)

We are investigating an OCR bug relative to Apple’s PDFKit.

Open the file in Preview.
Hold the Option key and select FIle > Save As.
Leave the filename the same and overwrite the existing file.

As a temporary workaround, enable DEVONthink’s Preferences > OCR > Searchable PDF: Enter metadata after text recognition. Even if you don’t change the metadata, pressing Save should allow the PDFs to work as expected.
Thanks for your patience and understanding.

Thanks for working on this! I do have Enter metadata after text recognition checked in prefs (see screenshot - is that correct?), but that workaround doesn’t seem to work for me. Then again, I’m not getting prompted for metadata. Hmm.