OCR select sections of microfilmed newspapers

CDdevon · November 9, 2021, 4:39pm

Hi there! I’m new to Devonthink, and so apologies if this question has been answered somewhere else. I’m wondering if there was a way to select sections of PDF files of newspapers, so I can OCR only select sections? I am looking at digital copies of microfilmed newspapers. The image is not the clearest (i.e. very blurry in certain sections, lots of fold marks), because the originals were most likely brittle to begin with. Given this, I think only certain sections can work with OCR. I’m having trouble figuring out what “select tool” I need for this, and also how to then ask Devonthink to OCR the sections which I’ve selected.

Basically, I’m trying to work use Devonthink so that I don’t have to transcribe everything.

Thanks for your help!

BLUEFROG · November 9, 2021, 4:46pm

Welcome @CDdevon

Sorry but zone-based OCR is currently not possible in DEVONthink. The request is noted however.

ipanini · November 9, 2021, 5:28pm

What jumps to mind:

In Devonthink you can fairly easily copy out a single page / number of pages
create a new single page pdf from there and try OCR
or
screenshot the part that you need > open with Preview, maybe do some rotating if necessary > do the OCR

In the far past I have used ABBY, there I could select rectangles, columns etc and let it do it’s thing. If I’m not mistaken (Jim will correct me if I’m wrong) the OCR engine in Devonthink is based on ABBY, or is ABBY’s.

I’ve also had fairly good results with PDFpen Pro. Don’t use it every day, so not aware of the actual status, but they had an interesting capability to order / reorder and at the same time create a document index where you could put links to specific pages yourself.

Success!