How to change the text in PDF+text documents

When OCRing a paper document with Readdle’ Scanner Pro the resulting electronic document is of the kind PDF+Text.

The text part is not always perfect. - Is it possible to edit this text component? If there is no way to do this in DTP, does anybody know of external (command line) tools to edit or inject other text?

DEVONthink doesn’t support editing of the text layer.

Not a perfect solution, but you can convert the PDF+Text document to an RTF or plain text document (Data > Convert). These can be corrected so accurate text can be made available for See Also etc (and of course you can embed a link to the original).

Not ideal, but perhaps worth it for some documents.

PDF Expert for the Mac is not cheap, but it allows you to edit the text of a PDF. I’ve used this in a surprisingly large number of instances and it’s basically like magic. If you find yourself needing to edit the text of PDFs often, it’s worth the investment for that feature alone.

Well, I didn’t expect DTP to be able to do it (a bit out of scope, I reckon), but thanks heaps, scottlougheed for your tip with PDF Expert.

Is the Mac version able to edit the hidden text, which Scanner Pro somehow puts “behind” the scanned image? The iOS version cannot do this.

No, editing the Scanner Pro-OCRd file in PDF Expert for macOS seems to have the same problem that PDF Expert for iOS has.

Thanks, korm. - This means I’m back to square one: No way to edited the OCRed text of a PDF…

Actually PDF Pen from Smile allows you to edit the OCR layer (that is, the “hidden” layer of recognized text). Haven’t done a ton of that, so the sample is small, but I’ve yet to encounter a PDF with an OCR layer it can’t edit (I have encountered some where the OCR layer is hopelessly garbled, but that’s not PDF Pen’s problem!
Here’s an invoice scanned and OCR’d with Scanner Pro, viewed in PDF Pen Pro using the “View OCR Layer” view.
Screen Shot 2017-08-21 at 10.16.50 PM.png

The scan is obviously black and white and the OCR’d text is represented in blue, you can just barely see the ghost of the actual image below the OCR layer. All of that blue text is editable, but changes made to the OCR layer obviously won’t be reflected in the image layer (But, of course, will be reflected in searches!

Now if only we could get all those features into a single app!

1 Like