PDF editor, correcting OCR

SlickSlack · January 15, 2022, 6:58pm

There would appear to be much experience on here with the various PDF manipulation tools.
I was recently given access to some hurriedly done photoscans of a very hard to find book 15 years out of print. I don’t intend to distribute the results of my work publicly but do intend to share it with a few people with similar interests. It’s a cookbook for a restaurant that no longer exists except in the memories of all the people that ate there.

The photo-scans were done a little haphazardly so it required lots of de-skewing and cropping/stamping out fingertips most of which I have accomplished in Photoshop. I’ve got it to the point where it’s readable as a 104 page PDF, but the skew another distortions are never completely gone.

I’ve run OCR on it a few different ways (Devonthink, Acrobat, QuickScan) and they’re all quite far off, especially on the worst quality pages.

My Question:
What are the options to edit the OCR’ed text without getting into changing how the scans look? I was aiming for just having an accurate and searchable text layer for each page but Acrobat leads one down the road of editing all the graphics and text blocks. DT doesn’t really give you access to the OCR text layer.
Hope this isn’t too far off topic.

Blanc · January 15, 2022, 7:05pm

Does this document on editing & correcting OCR in Acrobat DC help?

mbbntu · January 15, 2022, 7:08pm

In the past, I always got the best OCR from PDFpen Pro: https://pdfpen.com/pdfpenpro/.

I haven’t tested it against others for some years now, so the situation may have changed. However, it might be worth trying.

SlickSlack · January 15, 2022, 8:01pm

Thanks!
That doc is far better than anything I came up with on the Adobe site. Acrobat really comes across as a distant relative of the rest of the Adobe suite when it comes to UX and overall feel. Support is also much harder to get results from.

Acrobat still fights you if you just want to say “look I know your OCR can’t figure out this distorted image, let me just type it out for you, using my flawed human eyes and brain and we can all go home faster”

I think I’ve used up my free trial of PDFPen on another project. This one project is sort of a one-off so I don’t think I’ll be going down that road. Thanks though.

Blanc · January 15, 2022, 8:13pm

Period. I hate it, but have found nothing else which can get everything done I need.

mbbntu · January 15, 2022, 8:14pm

Just so you know, PDFpen is no longer with Smile Software. It was bought by another company. It may be that you can get a trial version again.