OCR and devonthink

ScottNYC · October 8, 2009, 4:47am

What are the benefits to having an an OCR application like readiris or abbyy finereader express installed if you already have devonthink office pro which has OCR capability? Is there some functionality that readiris has that devonthink doesn’t? Im just trying to figure out if there would be any benefits to having readiris installed, or would it be redundant?

annard · October 8, 2009, 8:06am

If you have more demanding tasks than converting a file to a searchable PDF with only minor customisations you’ll need a dedicated OCR application.

nestor · October 8, 2009, 8:54am

Just a question: beside pdf in english I have a lot of pdf in spanish, italian and other languages that uses a lot of diacritics. In such cases the ocr process in DTPO makes a lot of mistakes. For example: “ó” becomes “6” so “resolución” becomes “resoluci6n” etc. I have set the language to “spanish” but it seems there’s no real difference. Made the same test in Adobe Acrobat at university and got same result. Is text in spanish a “demanding task” for the dtpo ocr?
thanks
NEstor

annard · October 8, 2009, 9:05am

That depends on the quality of your scans (we recommend 300dpi and colour). I have some Hungarian and Latvian examples and they have no problems whatsoever.

korm · October 8, 2009, 12:45pm

Is there an available feature comparison between the FineReader instance bundled with DTPO and FineReader Express or other standalone OCR tools?

Is it possible to point the ‘convert to searchable PDF’ action to an OCR tool other than the inbuilt ABBYY tool?

annard · October 8, 2009, 1:11pm

The underlying engine of all the Abbyy related tools on the Mac are the same. Check these forum archives for comparisons between this one and the one we used by IRIS in version 1.x. Most people were happier with Abbyy it seems, but then again the currently shipping ReadIRIS has a more modern OCR engine compared to the one they licenced to us.

You can do whatever you want with an external OCR application through our extensive Automator and AppleScript support. But you’ll have to access these through the Script menu (that allows for keyboard shortcuts).

povlhp · October 8, 2009, 2:23pm

A real OCR program also allows you to create word processing documents with the formatting more or less intact, so you can edit what you brought in. This is one of the major differences.

You can also select what parts of text belongs together, to help it distinguis the footer from the rest of the text, or to help it handle multiple columns better etc.

I decided to pay for DTPO since the HP software no longer works with Snow Leopard, and then I get some e-mail integration I have not yet tested, and better iPhone websites if I decide to go for that.

twicks · October 8, 2009, 10:25pm

I used to use Omnipage, which hasn’t been updated in ages. The big bonus with OP was that you could create little blocks on a page that would be OCRd sequentially. OP also let you define particular blocks as tables where you could go in and move boundary lines around to capture the data in a close approximation of the original. There were other little things as well.

The big negative aspect was that there was no way to change the size of the preview font, which was something like 8pt. I actually had to get 2x reading glasses to read the darned thing! Repeated contacts with the developer went unanswered for years.

Another negative was how poorly OP recognized text. See below.

I found the Readiris engine in DTPO 1.x wasn’t much better at recognizing text but I was ecstatic when DTPO v2 betas were introduced and we were using Abby. Of course the early versions of V2 and Abby were messy but things really improved as the DT team worked with Abbby to get things right (you should have read our user complaints here in the forum!).

Now I use DTPO exclusively as I find that the recognition rate is extremely high, even on pages where the paper was low grade and the ink bled a bit. I have yet to run up against a situation where the DTPO process doesn’t work, but I should note that I don’t do lots of document scanning these days.

So, as stated earlier, if you need some of the special features in the stand-alone apps, then go for one or the other. You’ve got to use the right tools for your job.