edit the OCRed text layer of a PDF

jomu · May 24, 2013, 6:37am

I searched for the above subject in the forum, found a few people posting the same problem, i.e. wishing to edit a non correct OCRed word, but found no solutions .

If there is a solution, please let us know. If not, and feasible, then please make it possible .

Thanks,

Joseph

Bill_DeVille · May 24, 2013, 4:23pm

There really is not a satisfactory way to edit OCR errors in the text layer of a PDF. I’ve been wishing for that for many years, but doubt that it will happen, as Adobe’s PDF format isn’t conducive to a text layer editing approach. It’s not something that DEVONtechnologies will address.

Some OCR apps provide the option of making the text conversion display in the image layer of the resulting searchable PDF, so that Adobe’s allowance of very limited text correction of the image layer could be applied. I don’t approve of that option, as it could result in changes of documents such as contracts, etc. where any deviation from the original could have serious consequences. IMHO, the ability of the image layer to faithfully represent the original paper copy is critically important, even if there are OCR errors in the text layer.

I’ve got a couple of PDFs resulting from scanning legal documents that are important to me, but contain OCR errors resulting from blemishes and/or handwritten markups in the documents. As a result, some names and terms are not searchable.

A workaround in such a case, to provide a fully searchable version of the document, is to select the PDF, choose Data > Convert > to plain text. An editable text document will result, and the errors can be corrected in that copy, by reference to the image of the PDF. That’s an aid to searching, but of course the PDF file should be retained as the more important file were questions to arise, such as the terms of a contract, etc.