Right now when I scan in a document using my ScanSnap, the ScanSnap software will run its own OCR on the document first, and then it will save it to DevonThink, which will then run its own OCR software.
Obviously, I don’t need to do both. Which one should I turn off?
You may also want to check the file size and image quality of the generated files. The OCR process of DEVONthink Pro Office insists on recompressing the scanned image using the JPEG algorithm, which is actually quite a bad choice for black-and-white documents. You can select the target resolution and quality of the JPEG process, which allows you to trade quality against size, but unfortunately you can’t turn this off altogether.
So in case your scanner outputs OCR’d PDF documents with CCITT Group 4 or even JBIG2 compression, you might want to steer clear of DTPO’s bundled OCR.
Right, it’s not the OCR engine itself that is “crappy”, but the way the PDF is reassembled at the moment when the text layer is being attached. Here we’d want the software to give us the option to preserve the images 1:1, at the very very least in all cases of CCITT Group 4 or JBIG2 compression.