Some Feedback on OCR Accuracy

Thanks for the suggestions.

300dpi and 80% quality doesn’t seem to create an obvious degradation in quality. The image was about 10meg vs ~6meg in Acrobat–I can live with that.

I agree that for things like bills etc 150dpi and 50% quality works great–I don’t care if the visual display quality drops on those files.

For me the more important issue has to do with archiving journal articles that aren’t available in electronic format (e.g., Environmental Ethics doesn’t provide the option of an electronic version, so the only option for electronic archiving is to scan and do OCR if you want to be able to search). They take up lots of shelf space, and are expensive to move. For those, looks like I’ll be using the 300dpi/80% option or greater.

I think I’ll just keep copies of the scanned file of much works such as books. After investing time or resources in scanning my book library, it’s unfortunate that the options are to degrade the quality of the original scan (lose data) or accept larger file sizes. But that appears to be the state of the science of OCR at the moment. For what it’s worth, I really don’t like the loss of visual quality that comes with 150dpi/50% quality, and would not want to read a book at that quality on screen–but that’s a preference.

I’ll experiment again with recognizing a book with DT and report back. Previously DT’s OCR has failed on large projects, and so I’ve turned to Acrobat for those (if you do it in batch mode, you at least don’t have to pay the penalty of Acrobat drawing the screen for each page, so it’s less processor intensive–similar to the (good) way DT handles scanning.

Thanks again for the suggestions.

I’ve restricted myself to journal subscriptions that provide online access. My speciality is environmental sciences and policy, and there are enough online resources to keep me quite satisfied. I no longer opt to receive paper copies.

In the past I had subscribed to several weekly journals, such as Science and Nature. The paper copies took up lots of bookshelf space and weighed hundreds of pounds. Worst of all, they for the most part just sat on shelves, as for years I’ve worked with the online resources, anyway. When I moved in 2007 I found a library that would accept them.

Let us know your experience with Acrobat OCR. I’ve compared some Acrobat versus DTPO2 pb4 OCR of the same PDFs and overall give the recognition accuracy edge to DTOP2’s ABBY OCR. The early problems with dropouts have been solved.