OCR and weak AI

@bws950

Why dont you try the demo?

ABBYY’s separate application in conjunction with DT3

A little, however, I need to buy a newer version

DT3’s inability to correctly identify columns of text

It will probably do a better job automatically, however, if not you can select columns as separate blocks of text through the GUI. Most docs have the same columns on each page to could reuse for the whole or most of the document.

but assumed that since DT3 and DTTG already use the ABBYY SDK, I wouldn’t get improved results.

This is not the case. Because you can help finereader by selecting blocks of text, images , equations (as images. This is a god send as the SDK in DEVONthink will always try to convert them to text which it cannot do) and tables before starting the OCR process. Things like making a PDF searchable you get to spellcheck and make corrections before saving; and add new words to the dictionary. This means overtime results can improve. You can also train Finereader to recognise new patterns and there are also some image processing options which include de-skew and straighten lines to get better results.

When converting to different formats such as PDF, Word, excel etc it can do a better job in making the formatting of the page layout match the original. It can try to match fonts to those system installed as well.

come up with workflows to solve problems
Well it’s a separate program and the best results may need some manual input. You could OCR in DEVONthink to get basic search etc then when you need to use the document run it through finereader to get the improved OCR results you will need to save over the same file to keep the item details in Devonthink attached.

You could use a workflow of tags and reviewing reminders to organise the process e.g tags for SDK OCRed and Finereader OCRed, Todo etc. Or you could do it as a part of importing images and documents e.g have an inbox for finereader on your desktop and after OCRing save to the inbox folder of DEVONthink for filing in DEVONthink later.

The windows option does offer some better features
see comparison chart PDF Editor Software Price | FineReader PDF
I haven’t tried running the windows version on windows 11 arm in parallels yet.

(I wonder why Apple hasn’t released its iPhone OCR as a separate Web utility?

Because it uses the neural engine on the device and Apple’s privacy commitment means that this doesn’t send any data to the cloud its all done on device.

@bws950

Also take a look at Mathpix the web and iOS/iPadOS app versions can convert PDFs (not just equations to LaTeX) into several different file formats.

Thanks very much for this – I’ll give the ABBYY application a try and see if I can come up with a workable document flow.

Apple’s OCR (rather: its Vision framework) is scriptable using the ObjC bridge, so you can use it in stand-alone scripts or integrate it with DT or whatever.

However, in my experience it is less reliable than for example Abby’s technique: it has problems to recognize text on the same line as such, which results in scrambled text.

I used to use an older version of the stand alone Abbyy Fine Reader for Mac with a great script from a user on the forum with very good results. Since I upgraded to an M1 Mac and the latest version of FineReader, which is no longer scriptable, I just use the Abbyy OCR engine in DT for the convenience.

Here is a link to the other post with the details: Script to OCR PDFs with the latest FineReader - #16 by sawxray

I did some testing.

  • All Document are OCRed under DEVONthink 3 for search etc and tagged as basic-ocr
  • Any significant annotations that need to be taken or conversion I would run them through Abby Finereader for Windows. You do get better results but it is time consuming but still way more productive. Tag as full-ocr

I personally think Abby finereader for Windows is better than the Mac version.
The windows version seems to work fine on Windows 11 for ARM and Parallels.

That’s a shame the latest Mac version doesn’t even support Shortcuts which can be called from an AppleScript.

Yes, the Windows version is indeed more full-fledged than the Mac version.

Fine reader 15 for mac
says

HOW TO JUDGE A PAINTING
By ALBERT C. BARNES
Dr. Barnes is well known as a collector. His home at Overbrook. Pa., contains the most comprehensive collection of modern pictures in America. It includes fifty Renoirs. His opinion should be of exceptional interest. —ED. NOTE.

and

Devonthink 3.8 reads it as

HOW TO JUDGE A PAINTING By ALBERT C. BARNES

Dr. Barnes is well known as a collector. His home at Overbrook. Pa., contains the most comprehensive collection of modern pictures in America. It includes fifty Renoirs. His opinion should be of exceptional interest.—ED. NOTE.

source obtained from Google Books p217

Google books ocr layer is missing its spaces.

what would be useful is the ability to correct the underlying text layer particularly before making a pdf. This feature is apparently unique to the windows version. You can’t really text mine a document if it has spelling errors.

Thanks for that – That was using the Google Books version as the source, right? Whereas I was using a copy from a different archive… which is why Abby FR for DT worked so much worse on it. I guess I was using it as an example of a text where some kind of more intelligent AI could parse enough of the text’s meanings to fix the errors generated from a lousy source…

Yeah, I was using the google books version. If your scans are marginal, you might have some success with ScanTailor (especially Scan Tailor Advanced, which is muticore, and thus faster.) I believe there is a homebrew recipe for it (scantailor-advanced)

I see Abbyy has gone to a subscription model.

Coming back to the question of “weak AI”, since I (again) experienced the weirdness of what is supposedly “strong AI”, aka Google Translate applied to reviews. In this case, they have context: They know that the people are talking about restaurants or hotels or shops in a certain region.

Still, when looking at the translation of reviews for Peruvian restaurants in G’ maps, they use “soles” (as in shoes) or “suns” when trying to translate prices. Which are given in Peruvian “Soles” in the original Spanish text, as that is the name of the local currency.

So, if not even Google gets this simple thing right (i.e. figuring out that someone complaining in Spanish about the price of a meal in Peru is not talking about his shoes nor about the weather), it seems fairly obvious to me that OCR can’t be much better. Given that it has no context whatsoever when doing its job (as opposed to Google).

1 Like

Just for fun, if you happen across the text again, pop it in deepl and see if it does any better (I have a hunch it will).

Well … I still think that even the most modest AI should be able to be weighted to choose POSSIBLE words over impossible ones, and even VERY likely words (in the context of a surrounding sentence or two) over very unlikely ones. Mistakes – possibly quite a few – would still be made, but far fewer than when OCR allows “Bis opinion” and “ba of exceptional" to stand instead of "His opinion” and “be of exceptional”…

You’re implying that the context is correctly recognized (aka OCRd). For which there’s no guarantee. And a “ba of exceptional” might refer to a misspelled Bachelor of Arts of exceptional (quality).

What you want is an ex-post analysis of the whole document. Which is understandable, but probably out of the range of any current OCR product. They’re simply doing that: Trying to translate pixels to characters. I’m not even sure if they recognize words (i.e. sensible sequences of characters, separated by spaces in western languages). To achieve what you’re looking for, the software would have to first create all these words and then run a wholly different algorithm (namely one that looks at the context and then at the context by sentence, paragraph, page and complete document. And then figure out that in this context “ba” probably does not mean bachelor of arts because there’s no “quality” following the “of”.

We’re not there yet. And the companies building OCR software would be opening a new Pandora’s box if they were to go down that road. In my opinion.

1 Like

You’re right, of course, that if OCR doesn’t even recognize WORDS, then it could not tell when one has gone wrong. But I’m fairly certain that word recognition is pretty far along – so many AI’s require it, and successfully perform it that it shouldn’t be much of a challenge.

I think you’re underestimating the difficulty level of such things.

1 Like