Simply put:
- I want to correct OCR text in hundreds of short (2-3 sentence) PDF files I have from 17th and 18th-century newsprint (this is genealogy related).
- Acrobat’s Correct Recognized Text is not an option. I get a “there were no errors” popup when I attempt this.
- As a workaround: I can convert to plain text file in DT, which produces a copy of the PDF as a .txt file, then edit that so as to build a verbatim transcription of the original. I could then paste those three sentences into the original, i.e. in metadata field created for this purpose, but this is where my concern emerges. I’m worried about certain words/phrases now being counted twice; those that were correctly OCR’ed, and those from my manual edit, of same.
For instance, if the original PDF actually did recognized Mr. Manfrengensen’s surname in 1690 newsprint, but now that same word is pasted into metadata, does DT end up now scoring this entire file higher-up in future search results, because that word is now getting two hits within the same file? (my goal is to avoid this!)
Thanks fellas! I’m open to any and all ways of pulling this off, so if a better workflow exists, I’d love to hear of it. Thanks in advance!