Scansnap Directly into DTPO?

When I first downloaded DTPO beta the OCR function was disabled so I scanned to a “To Process” folder, OCR’d in Adobe Pro and then imported into DTPO. Now that OCR has been enabled, I would like to scan directly into DTPO and have the OCR done there. However, whenever I scan, the files continue to be placed in the “To Process” folder.

I checked the Scansnap Manager’s preferences and under the Application tab - DEVONthink Pro is selected. The “Use Quick Menu” is unchecked. The image saving folder remains “To Process”

Is there something else I need to do to get my Scansnap to send the scan directly to DTPO and perform OCR?

Those are the settings I have and it’s working fine.

Note, check ALL windows and see if you have a dialog asking you to okay the launch of This is the thing doing the OCR and if it can’t run, OCR and import won’t work. Given this binary is not signed, OS X will not let DTP launch it without your approval (the first time).

Thanks CatOne! Everything seems to be working now. I had to launch seperately, will it load automatically in the future?

  1. Make certain that you have updated to DTPO2 pb3r2, and have installed the ABBYY OCR engine — you can check the latter by choosing Help > Install Add-ons. If the ABBYY OCR software has already been installed, the ABBYY check box will be unchecked. If you need to install it, you can choose the smaller download for your computer (Intel or PPC) instead of the Universal file. (If the ABBYY software has been downloaded, it will be found in ~/Library/Application Support/ — there will be a folder named Abbyy, and it’s Info panel should display a large file size.)

  2. In ScanSnap Manager, choose Settings and click on the Application tab. Confirm that the application choice is for DEVONthink Pro, and that it points to your installed copy of DT Pro Office 2.0 pbr2.

  3. Insert a sheet into the ScanSnap sheet feeder and press the Scan button. The scan output will be sent to DTPO2 for OCR and storage of the searchable PDF. If this is the first OCR run, there may be a pause for a while. OS X has a “protection” nanny feature that will be suspicious of new software that it sees opening and may ask you if you wish to open software downloaded from the Internet. Confirm. If you don’t see the message, the first OCR attempt may fail. If so, try again, it will work properly from now on.

  4. I get good results with the ScanSnap set for Best (Faster), Color, PDF output and no Color Compression.

If you set DTPO2 Preferences > OCR to send the original PDF to trash, you can ignore the location to which ScanSnap sends the file resulting from the scan — it will be deleted after OCR.

You might experiment with the DTPO2 Preferences > OCR for setting accuracy. For some paper copy I get acceptable, or even improved results using the Fast accuracy setting. For other types of copy, the Balanced or High Accuracy settings may work better. ABBYY uses non-linear algorithms for text recognition — if you scan and OCR the same document several times, you may see differences in text recognition.

I use a yellowed and stained 10-page paper as a tough candidate for OCR accuracy. I get the best results in this case with ScanSnap set at Best, B&W, PDF and DTPO2 pb3r2 Preference > OCR for High OCR accuracy. The recognition is very good, better than with IRIS, and the searchable PDF is slightly more than half the size produced by IRIS, and looks better. With Preferences > OCR set to save searchable PDFs at 150 dpi and 50% image quality, this paper looks better in PDF than does the original paper copy.

Memory usage during OCR processing has been improved by writing intermediate stages to disk, rather than by holding them in memory. Although that results in longer OCR processing time (because disk access is used), I can have a big queue of files being processed in the background and still have full use of the database and other applications. (I uncheck the option to set document attributes in DTPO2 Preferences > OCR so that the queue isn’t interrupted by messages asking me to enter title, etc.)