Anyone happen to know why OCR will very often choose impossible spellings of non-existent words over more likely spellings? Shouldn’t it use AI to help it figure out what a piece of text is likely to be?
I.e., instead of “reading” a piece of text as “.eo Stein, Ihe Art in
Painting” why doesn’t it use AI, and the vast store of previously written English texts, to read it as "Leo Stein, The Art in Painting.”
OCR is making “guesses,” anyway, at which letterforms to read from the patterns on the page, so you’d think it would use AI to educate those guesses as to which letterforms are more likely to be correct, given the surrounding context of the word and sentence… and the nature of previously encountered English texts!
Sorry if this sounds like idle curiosity on my part, but I am always deeply frustrated at the many errors that are STILL present in OCR’d docs, a decade after I started OCR’ing, so I’d like to at least understand why those errors “have” to be there.