OCR formatting– scanning with 2 columns

Sometimes I have a scan that has two columns and looks something like this:

The OCR in DEVONthink doesn’t recognize the columns. It generates lines of text that run across the whole page, making copying and pasting a real problem.

Other OCR engines (like the one in PDFPen) handle columns wonderfully… Is there anyway I can access other OCR engines I have licenses too from within DEVONthink? Is there a preference in the IRIS engine I could be tweaking to make this problem go away?

my workflow:
Drop a bunch of stuff into DEVONthink, sort into appropriate folders. At the end of the day, select all non-text pdfs and let DEVONthink run the OCR on them overnight. I REALLY enjoy the batch “convert to pdf+text” in DEVONthink and I don’t want to have to run these docs through a different OCR manually.

Any advice?

I saved your image file to my Desktop, then used File > Import > Images (with OCR) to import it to a database as a searchable PDF.

The PDF looked OK, displaying the side-by-side pages as individual columns. It’s easy to select a portion of text by holding down the Option key and “drawing” a box to select the desired text, then copying the selected text to the clipboard. That’s the behavior of OS X PDFKit, which is used by DEVONthink to display PDFs.

However, I noted that the OCR accuracy was extremely poor. You may want to take a look at your scanner settings. Perhaps the resolution is set too low. Acceptable OCR accuracy requires a scan of at least 300 dpi.

OCR accuracy was also very poor with PDFpenPro, which didn’t initially recognize the image as a scan, probably because of low resolution. I’ve found that for scanned images with acceptable resolution, DT Pro Office’s OCR engine is much more accurate than PDFpenPro OCR.

OMG! Option key! This solves everything.


Using the alt key to select a column really should be added to the Devonthink help files. It would have saved me time searching through the forum for this information.