OCR questions

I keep getting a dialogue coming up when attempting to apply OCR to a pdf, asking me if I’m sure I want to apply OCR again and that it works best with files at 300 DPI. First, I haven’t applied OCR to these files (strictly pdf, not pdf + Text). Second, I’ve scanned many documents at 150 DPI. Will this reduce OCR accuracy?

DEVONthink’s ABBYY OCR engine seems to believe the PDFs have a text (OCR) layer already, though it’s curious why DEVONthink is not displaying “PDT+Text” as their Kind. In order to evaluate the latter problem, it’s best if you write DEVONthink Support (info at devontechnologies.com) and send a few sample documents so they can evaluate the problem using your actual data.

You didn’t mention your scanning procedure - does your scanner have an option to OCR documents after they are scanned? Sometimes this is the default on a scanner, and you’ll need to turn off the option before using ABBYY.

Though 150dpi can make for poor quality of OCR results in general, your own results will vary depending on the nature of the original paper document (image, text) its quality (skewed, fuzzy, sharp), your scanner, and the OCR software you use. It is easy to check this: make sure your scanner is not OCRing files after they are scanned, then select PDF made from a document scanned at 150dpi, OCR it with DEVONthink, and then convert it to text (Data > Convert > to Rich Text). Try the same with the document scanned at 300dpi. Compare the two text results.