When I display a document (say from Bank of America) in my Safari window and then use the system “Print to PDF” option and the “Save PDF to DEVONthink 3” sub option, the OCR is garbage (even though I can copy and paste correct text from the Safari display of the page.)
This seems to be more likely with Bank of America sites. What am I doing wrong?
Did you actually perform OCR? A PDF printed to another app should rarely need OCR. Or do you just mean that the text layer is garbage? This is not controlled by DEVONthink as the system generates the PDF and then send it to DEVONthink.
What I’m experiencing seems somewhat random, in that some documents “printed” from BoA have correct OCR. Is it possible to instruct DT to OCR every document that shows up in its inbox, so there’s a consistent text layer?
This is a misunderstanding or misapplication of terminology.
OCR is a process of recognizing shapes in images as letterforms and creating text from it, very commonly as a text layer on a PDF.
Printing a document from a text-based original, say a rich text file, web page, or a PDF, already includes text. There is no need to do OCR on such documents and it’s possible to result in a less accurate document, if you do.