After OCR I cannot highlight / mark text in the PDF

In this case the documents don’t seem to be fully compatible to macOS’ PDFkit framework.

It’s interesting: If I do a cmd-A, the first part of the document is actually selected – just the sender and receiver information. All account details after remain unselected. No hints in the document information, either: I have the right to copy. But I can’t.

On the other side, searching in the document works in Preview as it does in DT3, but the search result is not highlighted in the PDF.

Oh, and PDFPen is able to select the text and hightlight search results.

And then there’s this interesting quote on stackexchange:

I received feedback from our engineers. They confirmed that there is an issue with the Preview feature in Catalina. They were able to reproduce the issue by testing it on Macs running both Mojave and Catalina and got the same result that you did. This is something that has been reported and will be addressed in a future software update.

In my case, the PDF was fine after I printed it as PDF from preview. Simply “Save as…” didn’t work, because the program refused to save.

The only difference, that I can see between a working and a not working document, is the ABBYY version. Version 12 could be the culprit… perhaps…

The same problem like in DT3 is in the preview app of MacOS. I’ll send you 2 files. A working one and one with the described problem…

I suppose it’s a problem with PDFkit (see above). Try printing (! not “save as…” the file to a new pdf) from preview. Does that help?

1 Like

Yes… that helps. Exporting as document and reimporting also helps…

Do you use an intelligent rule for OCRing the document?

No. And I have the problem you described with external PDFs, i.e. ones that already contain text.

I’m experiencing the same problem on all my OCR imports - even the ones where the text is very clear and legible. PDF text IS selectable if the document is opened in PDF Expert (though not in the Mac OS Preview app). Only remedy I’ve found is to right click in DT3.5 and “Convert to Searchable PDF” a second time.

A screenshot of Preferences > OCR plus a document before importing would be great, thanks.

As I’ve just said to you in a PM — but this may be of use to others — if “Enter metadata after text recognition” is ticked in Preferences this will fix the problem. But obviously it means dealing with a dialogue box which may not be optimal.

1 Like

I just applied an OCR rule (from OCR and DEVONThink To Go) to both PDFs and images from Evernote Scannable app, and the resulting PDF+text files have searchable text but I also cannot highlight/copy this text in DT or Preview (but text is selectable in PDF Expert).

Here are my OCR prefs:

I will attach both image and PDF pre and post processing by DT OCR as well. Scannable Document on Jun 17, 2020 at 9_41_49 AM.pdf (692.2 KB)

Post processing: Post OCR Scannable Document on Jun 17, 2020 at 9_41_40 AM.pdf (352.0 KB) Post OCR Scannable Document on Jun 17, 2020 at 9_41_40 AM.pdf (352.0 KB) Post OCR of pdf file.pdf (333.4 KB)

We are investigating an OCR bug relative to Apple’s PDFKit.

Open the file in Preview.
Hold the Option key and select FIle > Save As.
Leave the filename the same and overwrite the existing file.

As a temporary workaround, enable DEVONthink’s Preferences > OCR > Searchable PDF: Enter metadata after text recognition. Even if you don’t change the metadata, pressing Save should allow the PDFs to work as expected.
Thanks for your patience and understanding.

Thanks for working on this! I do have Enter metadata after text recognition checked in prefs (see screenshot - is that correct?), but that workaround doesn’t seem to work for me. Then again, I’m not getting prompted for metadata. Hmm.

Ahh… I missed that you’re using a smart rule to process incoming DEVONthink To Go files On Synchronization. That doesn’t trigger the Metadata panel.
@aedwards, I’m not sure if it should.

On a side note, ACEs are an amazing scientific find aren’t they? I work with kids in juvenile lockup in a volunteer capacity so I’ve looked into it a bit.

The metadata panel should not be displayed from a smart rule.

OK, thanks … I’ll wait for a fix from Apple or DT before running this rule much more.

Indeed! Bruce Perry’s work is also really interesting …