Ensuring OCR takes place, even 'sideways'

Hello All,

I’ve been using DTPro Office 2.3 (DTPO) for a while but this is my first post. I’ve searched the forums for an answer but with no luck, so maybe someone can help me.

I’m scanning a lot of newspaper and magazine articles into DTPO using a Fujitsu ScanSnap S1500M. Other info: iMac 24-inch just over 2 yrs old, 8GB RAM, OS X 10.7.2, everything saved to the internal 1GB drive with 350GB+ unused space.

The ScanSnap is brilliant and speedy, but DTPO doesn’t always interact with it consistently. In particular, DTPO seems to be fussy about whether it will save a scan as a PDF + Text (which is what I want), or just plain PDF (which I don’t want). Articles placed sideways seem to fox it every time – I end up with a plain PDF file that I cannot use.

Is this because I’m using the ScanSnap’s carrier sheet? I get PDF-only results even with properly-oriented articles at times.

Very often, but not always, if I repeat the scan I get the PDF + Text result I’m hoping for; but sometimes I don’t get that result and have to cut the article into smaller pieces and re-scan them in the upright orientation.

Of course, it would also help if I could tell DTPO to carry out OCR on the PDF file as a secondary operation, but I cannot find any way of doing that.

Any suggestions? Many thanks,

– Tim

OCR doesn’t work well if the image is not aligned correctly. The software can make some good guesses about alignment, but it cannot “see” what you see.

Make sure that you have the auto-rotate option set in the ScanSnap software (Settings > Scanning > Option > Allow automatic image rotation). The ABBYY software will attempt to align an image before recognizing the text, but it cannot overcome badly aligned images. Using the ScanSnap carrier sheet should improve the results. However, if your material is wide and narrow, so that you need to scan it sideways, then you might need to scan, rotate manually, and then OCR.

To rotate manually, select the document in DTPO. Have the PDF sidebar displayed (View > PDF Display > Sidebar). Control-click in the sidebar and choose “Select All”. Do it again and choose one of the rotate commands to flip the document 90 degrees at a time until it is aligned in normal reading mode. Then, attempt to OCR it.**

[size=85]** Select the document. Choose Data > Convert > to Searchable PDF, or control-click the document name and then the Convert menu. Or, if the Actions icon appears in your toolbar, use that. “Conversion” options are context-specific, depending on the kind of document, and sometimes do not appear at all (e.g., cannot convert a bookmark to a searchable PDF).[/size]

Thanks, korm:

The only step description you left out was “Then, attempt to OCR it”. But I discovered the Actions drop-down menu has a Convert → to Searchable PDF sub-item which does the trick. Once again, my thanks. I am delighted at the speed with which you solved my problem (I recently posted a problem to Apple’s board about Mail.app and got no help whatsoever).

– Tim