System Dialog "Print to PDF" yields random OCR text

When I display a document (say from Bank of America) in my Safari window and then use the system “Print to PDF” option and the “Save PDF to DEVONthink 3” sub option, the OCR is garbage (even though I can copy and paste correct text from the Safari display of the page.)

This seems to be more likely with Bank of America sites. What am I doing wrong?

We have no control over the output of printing to PDF from any application. DEVONthink only receives the file, nothing more.

Did you actually perform OCR? A PDF printed to another app should rarely need OCR. Or do you just mean that the text layer is garbage? This is not controlled by DEVONthink as the system generates the PDF and then send it to DEVONthink.

What I’m experiencing seems somewhat random, in that some documents “printed” from BoA have correct OCR. Is it possible to instruct DT to OCR every document that shows up in its inbox, so there’s a consistent text layer?

Sincerely,

Jeff

This is a misunderstanding or misapplication of terminology.

OCR is a process of recognizing shapes in images as letterforms and creating text from it, very commonly as a text layer on a PDF.

Printing a document from a text-based original, say a rich text file, web page, or a PDF, already includes text. There is no need to do OCR on such documents and it’s possible to result in a less accurate document, if you do.

Open a support ticket and attach a problematic PDF.