Editing the text layer of a pdf


Is it possible to edit the text layer in a pdf+text file? Although OCR usually works well, sometimes it makes some mistakes. I just wondered if it is possible to modify the text layer in the pdf itself to correct some of those mistakes.


Manuel Aguilar Hendrickson

I have been wishing for the ability to edit the text layer of a PDF for years. Still an unanswered wish.

OCR is a tricky business, as the software is looking at a picture of characters and words, comparing their images to a library of ‘known’ characters and/or words and creating a text conversion of the image.

The quality of the image is critical. Recognition errors may result for a number of reasons. A scan made at low resolution may be blurry or pixilated, making characters hard to recognize. Small or non-standard fonts can confuse recognition. Artifacts such as blotches, stains or smears confuse recognition. Handwritten marks such as underlining are confusing. Highlighting can make characters unrecognizable.

If OCR errors affect critical information such as one or more terms that are important for searches, a workaround is to enter that text into the Comment field of the document’s Info panel.

You know, Adobe Acrobat offers this capability. I think their edtitng window is pretty small for anything but correcting mispellings and other small errors, however, though this could be said of many applications’ “note taking” interface, Skim included.

Best, Charles