OCR and weak AI

APC · September 20, 2022, 12:20pm

Why dont you try the demo?

ABBYY’s separate application in conjunction with DT3

A little, however, I need to buy a newer version

DT3’s inability to correctly identify columns of text

It will probably do a better job automatically, however, if not you can select columns as separate blocks of text through the GUI. Most docs have the same columns on each page to could reuse for the whole or most of the document.

but assumed that since DT3 and DTTG already use the ABBYY SDK, I wouldn’t get improved results.

This is not the case. Because you can help finereader by selecting blocks of text, images , equations (as images. This is a god send as the SDK in DEVONthink will always try to convert them to text which it cannot do) and tables before starting the OCR process. Things like making a PDF searchable you get to spellcheck and make corrections before saving; and add new words to the dictionary. This means overtime results can improve. You can also train Finereader to recognise new patterns and there are also some image processing options which include de-skew and straighten lines to get better results.

When converting to different formats such as PDF, Word, excel etc it can do a better job in making the formatting of the page layout match the original. It can try to match fonts to those system installed as well.

come up with workflows to solve problems
Well it’s a separate program and the best results may need some manual input. You could OCR in DEVONthink to get basic search etc then when you need to use the document run it through finereader to get the improved OCR results you will need to save over the same file to keep the item details in Devonthink attached.

You could use a workflow of tags and reviewing reminders to organise the process e.g tags for SDK OCRed and Finereader OCRed, Todo etc. Or you could do it as a part of importing images and documents e.g have an inbox for finereader on your desktop and after OCRing save to the inbox folder of DEVONthink for filing in DEVONthink later.

The windows option does offer some better features
see comparison chart PDF Editor Software Price | FineReader PDF
I haven’t tried running the windows version on windows 11 arm in parallels yet.

APC · September 20, 2022, 12:31pm

(I wonder why Apple hasn’t released its iPhone OCR as a separate Web utility?

Because it uses the neural engine on the device and Apple’s privacy commitment means that this doesn’t send any data to the cloud its all done on device.

APC · September 20, 2022, 6:58pm

@bws950

Also take a look at Mathpix the web and iOS/iPadOS app versions can convert PDFs (not just equations to LaTeX) into several different file formats.

bws950 · September 21, 2022, 1:32pm

Thanks very much for this – I’ll give the ABBYY application a try and see if I can come up with a workable document flow.

chrillek · September 21, 2022, 6:23pm

Apple’s OCR (rather: its Vision framework) is scriptable using the ObjC bridge, so you can use it in stand-alone scripts or integrate it with DT or whatever.

However, in my experience it is less reliable than for example Abby’s technique: it has problems to recognize text on the same line as such, which results in scrambled text.

darrylmy · September 22, 2022, 3:36am

I used to use an older version of the stand alone Abbyy Fine Reader for Mac with a great script from a user on the forum with very good results. Since I upgraded to an M1 Mac and the latest version of FineReader, which is no longer scriptable, I just use the Abbyy OCR engine in DT for the convenience.

Here is a link to the other post with the details: Script to OCR PDFs with the latest FineReader - #16 by sawxray

APC · September 23, 2022, 12:29pm

I did some testing.

All Document are OCRed under DEVONthink 3 for search etc and tagged as basic-ocr
Any significant annotations that need to be taken or conversion I would run them through Abby Finereader for Windows. You do get better results but it is time consuming but still way more productive. Tag as full-ocr

I personally think Abby finereader for Windows is better than the Mac version.
The windows version seems to work fine on Windows 11 for ARM and Parallels.

APC · September 23, 2022, 12:34pm

That’s a shame the latest Mac version doesn’t even support Shortcuts which can be called from an AppleScript.

BLUEFROG · September 23, 2022, 12:49pm

Yes, the Windows version is indeed more full-fledged than the Mac version.

jerwin · September 23, 2022, 8:23pm

Fine reader 15 for mac
says

HOW TO JUDGE A PAINTING
By ALBERT C. BARNES
Dr. Barnes is well known as a collector. His home at Overbrook. Pa., contains the most comprehensive collection of modern pictures in America. It includes fifty Renoirs. His opinion should be of exceptional interest. —ED. NOTE.

and

Devonthink 3.8 reads it as

HOW TO JUDGE A PAINTING By ALBERT C. BARNES

Dr. Barnes is well known as a collector. His home at Overbrook. Pa., contains the most comprehensive collection of modern pictures in America. It includes fifty Renoirs. His opinion should be of exceptional interest.—ED. NOTE.

source obtained from Google Books p217

Google books ocr layer is missing its spaces.

what would be useful is the ability to correct the underlying text layer particularly before making a pdf. This feature is apparently unique to the windows version. You can’t really text mine a document if it has spelling errors.

Blake · September 27, 2022, 10:38pm

Thanks for that – That was using the Google Books version as the source, right? Whereas I was using a copy from a different archive… which is why Abby FR for DT worked so much worse on it. I guess I was using it as an example of a text where some kind of more intelligent AI could parse enough of the text’s meanings to fix the errors generated from a lousy source…

jerwin · September 28, 2022, 3:40am

Yeah, I was using the google books version. If your scans are marginal, you might have some success with ScanTailor (especially Scan Tailor Advanced, which is muticore, and thus faster.) I believe there is a homebrew recipe for it (scantailor-advanced)

I see Abbyy has gone to a subscription model.

chrillek · October 7, 2022, 1:47pm

Coming back to the question of “weak AI”, since I (again) experienced the weirdness of what is supposedly “strong AI”, aka Google Translate applied to reviews. In this case, they have context: They know that the people are talking about restaurants or hotels or shops in a certain region.

Still, when looking at the translation of reviews for Peruvian restaurants in G’ maps, they use “soles” (as in shoes) or “suns” when trying to translate prices. Which are given in Peruvian “Soles” in the original Spanish text, as that is the name of the local currency.

So, if not even Google gets this simple thing right (i.e. figuring out that someone complaining in Spanish about the price of a meal in Peru is not talking about his shoes nor about the weather), it seems fairly obvious to me that OCR can’t be much better. Given that it has no context whatsoever when doing its job (as opposed to Google).

Blanc · October 7, 2022, 1:57pm

Just for fun, if you happen across the text again, pop it in deepl and see if it does any better (I have a hunch it will).

Blake · October 7, 2022, 2:31pm

Well … I still think that even the most modest AI should be able to be weighted to choose POSSIBLE words over impossible ones, and even VERY likely words (in the context of a surrounding sentence or two) over very unlikely ones. Mistakes – possibly quite a few – would still be made, but far fewer than when OCR allows “Bis opinion” and “ba of exceptional" to stand instead of "His opinion” and “be of exceptional”…

chrillek · October 7, 2022, 2:51pm

You’re implying that the context is correctly recognized (aka OCRd). For which there’s no guarantee. And a “ba of exceptional” might refer to a misspelled Bachelor of Arts of exceptional (quality).

What you want is an ex-post analysis of the whole document. Which is understandable, but probably out of the range of any current OCR product. They’re simply doing that: Trying to translate pixels to characters. I’m not even sure if they recognize words (i.e. sensible sequences of characters, separated by spaces in western languages). To achieve what you’re looking for, the software would have to first create all these words and then run a wholly different algorithm (namely one that looks at the context and then at the context by sentence, paragraph, page and complete document. And then figure out that in this context “ba” probably does not mean bachelor of arts because there’s no “quality” following the “of”.

We’re not there yet. And the companies building OCR software would be opening a new Pandora’s box if they were to go down that road. In my opinion.

Blake · October 7, 2022, 3:00pm

You’re right, of course, that if OCR doesn’t even recognize WORDS, then it could not tell when one has gone wrong. But I’m fairly certain that word recognition is pretty far along – so many AI’s require it, and successfully perform it that it shouldn’t be much of a challenge.

BLUEFROG · October 7, 2022, 3:22pm

I think you’re underestimating the difficulty level of such things.

system · October 6, 2025, 3:22pm

This topic was automatically closed 1095 days after the last reply. New replies are no longer allowed.