alternative, faster OCR?


I use the Fujitsu SnapScan to scan Documents directly to DEVONthink Inbox.
For OCR, there is “ABBYY FineReader for ScanSnap” included.

For documents with 25 pages, I wait several minutes (i7 Processor, 16 GB RAM).

Is there an alternative (with costs), that is faster and uses multi-processor?


There are other people who can provide a mor knowledgable answer than I can, but here is what I have found as a relatively new user.

I also use a Fujitsu and a MacBook Pro with i7 & Lion 10.7.2, and I often scan and OCR very large documents.

I’ve used Acrobat, Abby, PDFPen Pro, and others I can’t remember. In general I find DTPO as fast, and in perhaps faster, than the alternatives (this my impression and not a methodical comparison). The key for me is accuracy. I am continually amazed at the OCR accuracy of DTPO . I scan a lot of old, ugly documents. DTPO accurately translates documents that choke other OCR programs.

As an aside, I like and usually use, Acrobat’s “optimism file” and “reduce file size”. My workflow consists of scanning to a separate folder outside of DTPO. I then use Acrobat to review the document; rotate pages, delete blank pages, delete unneeded pages. Then I optimize, the reduce file size (I reduce file size by making document compatible with Acrobat 8.0 and later, this often results in signicantly smaller files) at this point I import/drag the document into DTPO and let it do it’s magic.

This process generally works with PDF scans that have been sent to me via email and documents I’ve snatched for the web. (Acrobat, on occasion, will declare a document can’t be optimized, but usually these are documents that won’t really benefit from optimization). And, without getting into ethical discussion, I use AnyBizSoft PDF Password Remover to allow me to manipulate protected documents.

Thanks for your reply.

You’re right, accuracy is more important than speed. DTPO internally uses the Abby Finereader Engine (as I understand it). And I tuned the settings to accuracy over speed. This may slow it down that much. When I use Acrobat OCR, its much faster - but seems to be not so exact.
It is very very difficult to compare speed and accuracy “by hand” - and find the best option.

So there is no answer with a better, faster OCR engine. Maybe there is none.

Thank you for the explanation of your workflow too. I scan and also save all documents directly to DTPO. But when I read yours, I’ve one question in mind.
Would you be faster if you’d scan and save direct to DTPO and open/edit the documents in Acrobat from there (doubleclick). Maybe you can change DTPO, not to open the internal viewer but Acrobat (that’s a question).

Double-clicking is reserved for opening a document in its own window inside the database.

A selected document can be opened externally in a chosen application. To do that, Control-click (right click) on the selected document and choose the contextual menu option ‘Open with’, which will present a list of applications capable of opening that document. The application that was defined in the Finder as the ‘parent’ application will be shown as default.

Or, add the ‘Open Externally’ icon to the Toolbar (View > Customize Toolbar). Select a document in a view list and then click on the ‘Open Externally’ icon, which will show the parent application as defined in the Finder, and will open the document in that application.

To define the parent application of a file of a specific filetype, such as PDF, select any PDF file in the Finder and open its Info panel. If you wish to make Acrobat the parent application that will open that file when it is double-clicked in the Finder, choose ‘Acrobat’. Then, immediately below that option, make Acrobat the ‘parent’ application for all PDFs.

The answer, for me, is that it depends on what kind of document I’m scanning and what I’m going to do with it. If I’m going use Acrobat’s tools I takes fewer clicks to get it done. See my question and the great responses I got;

I set up a number of profiles in scansnap manager. When I scan simple, modern documents on quality paper I scan directly to DTPO. These documents don’t need help from Acrobat. When I scan compromised documents; old, typewritten, faxes, copies of faxes, damaged, … I have ScanSnap scan to folder I just simply call “Acrobat Scans”’ I set for Acrobat to open the document automatically. In Acrobat, I scroll through the document and rotate the upside down pages, delete blank pages (with thin original documents, the image will bleed through and ScanSnap will treat as the back of two sided document, these pages need to be deleted). Then I optimize, then reduce file size. During reduce file size, Acrobate asks for a file name, so naming the file becomes a no click step. It’s at this point I move the file into DTPO.

It’s not a particular advantage, but this process leaves an original scan in my “Acrobat Scans” folder. At the end of the day, I move a day’s of scans into dated sub folder. I could delete these files as I process them, but I’ve long had the habit of putting things in dated folder for future deletion a habit developed before macs had “trash” from which files could be recovered.

So in the end, speed and ease, depends on what your trying to accomplish.