Upgrade and OCR

I’m thinking about upgrading to DTP Office, and I was wondering if the OCR was good enough to read images from books and official documents captured on a digital camera.


Yes, if the images are good enough to allow accurate OCR. Yours is a ‘chicken and egg’ question. :slight_smile:

Professional equipment to produce digital copies of printed material generally uses digital cameras, and can be quite expensive. But many of today’s consumer and prosumer digital cameras can do a good job. I’ve used a RICOH 5 megapixel camera that includes a special TIFF mode for copying text, and a Canon 12 megapixel Rebel EOS DSLR.

I’ve gotten good OCR of TIFF images taken with a digital camera on a copy stand. High resolution JPEG images work also. Avoid image formats such as RAW, as the ABBYY OCR can’t read them without conversion. The camera should have a good lens and image resolution of 5 megapixels or higher. Good lighting is important. For copying book pages, it may be necessary to flatten them using a non-reflective sheet of glass, to prevent ‘bowing’ of the pages with resulting curvature of lines of text.

If I had to digitize books I would use a digital camera on a copy stand instead of a flatbed scanner. It’s much faster to flip pages this way, instead of picking up the book and positioning it on a flatbed scanner for page changes.

Don’t ask me for recommendations of copy stands (I’m looking but haven’t decided). I had cobbled together my own setup from lab equipment, but I’ve not been able to find all the pieces since the move to my cabin. I’ve had a couple of my old books digitized recently, and cheated. I had them digitized at a local print services shop, taking advantage of their professional digital camera setup with automatic page flipping. ABBYY would have done a good job on the resulting PDFs, but the print shop performed OCR for me, as well. :slight_smile:

Wow! Another wonderfully instructive essay by Bill. Cabin living must stimulate the gray matter, huh?

Regarding the whole scan with intent to OCR scenario, there are three ways you can accomplish this.

  1. To take photos with your digital camera

  2. To spend a lot and follow Bill’s route with a print services shop to get the whole project done professionally

  3. If you have a scanner with a document feed, and if you can stand to have the book destroyed, take it to a print services shop and have them slice off the spine, giving you a stack of loose pages that can be fed into the ADF.

I recently paid $20 for about 27 paperbacks and the ease of scanning into DTPO is incomparable.

And I’ll put a plug in here for the Abby OCR engine: Top notch. Such accuracy I never thought possible. Thanks to DevonTech for switching to Abby!

This is great. Now, I have to find a way to get the snapscan past my wife’s credit card…


Jeff: there’s always that hurdle to clear! I almost had to do a Keynote presentation to mine to illustrate how great it would be to file all her paper junk using an ADF.

My big sell at the moment is how to justify an i7 27" iMac when we really don’t need it. I hear gold’s still over $1,000/oz and I’ve got a few ounces in my mouth. Yuck! :angry: