DT's OCR and AI/LLM?

Hi Devonthinkers – A bunch of papers on the Internet say that it should now be possible to harness Large Language Models to vastly improve OCR. (Even when I simply I ask Google’s Bard to “correct” a typically faulty text yielded by DT’s ABBYY reader, it easily gets rid of ALL of its errors, without even needing to see the original image that was OCR’d. I guess it substitutes statistically likely readings for the statistically impossible ones in the OCR’d text.)
Does anyone know if there’s a way – or soon to be a way – to implement/automate AI correction of documents OCR’d within DT? It seems that ABBYY has some such pairing of AI and OCR, called “ABBYY Vantage” but, as a non-specialist (an art historian, in fact) I haven’t been able to tell if it’s what I need.
Any thoughts much appreciated. I OCR a lot of historical documents, and then use DT to do Boolean searches of their contents, but the sheer number of errors in the OCR almost guarantee that my searches miss a lot of relevant material.

I don’t, but did you try putting your text through LanguageTool (which is free for up to 3000 characters)? I find that quite helpful, though it doesn’t (fortunately) autocorrect. And I have no idea how that works with old languages.

That looks rather business-oriented, and you have to train it. I guess your texts are really rather about the probability of one word following another one. The Vantage stuff has to consider different regions of text on a page.

Thanks. My texts are all in modern English, in fact, so language isn’t a problem. Just looking for basic text-recognition, which is surprisingly hard for letter-by-letter (non-contextual, non-semantic) OCR to get right. I think GPT-4 might correct uploaded texts even better than LanguageTool – but feeding each of my thousands of PDFs from DT into GPT would be too laborious to be practical. Hence my question about somehow automating such a process into DT. (I’m so tech-ignorant that I’m afraid I have no idea what that would involve, or if it’s just an absurd ask of a database program.)

OTOH, ChatGPT/OpenAI would certainly be grateful for another bunch of free training material.

I seem to remember that @rkaplan published a script here – did you search the forum?

It was here