[DT 2.0pb3] OCR of traffic ticket doesn't recognize key info

Dear forum,

Summary: DT OCR of traffic ticket fails to recognize all NON pre-printed information, e.g., ticket number, issue date, issue time, state, plate number, etc.

Config: ScanSnap 510M auto-handoff to DT 2.0pb3, Macbook early 2007, white, 4GB RAM, 2.0Ghz Intel Core 2 Duo. ScanSnap Manager set for Excellent scanning, allow auto blank page removal, correct sewed char strings, allow auto image rotation. DT set for 300 dpi, quality 100%, original document Move to Trash, set attributes UNSET, Primary language English, no secondaries selected.

Activity: auto-scan traffic ticket directly into DT.

Hoped for result: OCR allows ticket to be searched via DT and Spotlight.

Actual result: OCR recognizes from ~315 to ~350 words in the ticket, depending on scan settings in SnapManager. Unclear whether DT scan settings affect number of recognized words. All of the “interesting” information on the ticket – ticket number, issue date, issue time, state, plate no, etc. – i.e., the information that is NOT pre-printed – goes UNRECOGNIZED.

Is this a feature or a bug? Is there any workaround or fix I can implement?

Thank you in advance for your time reading this posting and any response you may supply!

Alan

Currently, the settings in DTPO2 OCR preferences do not affect the accuracy of recognition. But scanner settings, especially those involving scan resolution, contrast, etc. certainly can affect OCR recognition accuracy.

The ABBYY OCR engine included in DTPO2 does not recognize handwritten text.

Handwritten text presents many more recognition problems than does printed text, as there are many more variations in people’s handwriting than in typical fonts and font styles used in printed material. Some form of training is usually required to improve recognition of an individual’s handwriting. Either the software includes training via error correction, or the individual is self-trained to accommodate the eccentricities of the OCR software and improve handwritten entry. :slight_smile:

I use a ModBook (a custom Mac tablet with digitizer pen input). Printed handwriting can be directly recognized by Apple’s Ink software, but the software isn’t trainable; I have to train myself to carefully write characters and words. I find Ink suitable for taking short notes and answering email. Axiotron is working with a software developer to bring cursive handwriting to the ModBook, but that feature is behind the original release schedule.

IRIS includes printed handwritten input recognition in ReadIRIS Pro. It requires the user to print input onto a special printed form. Personally, I found it unusable. ABBYY includes handwriting recognition in OCR, but at additional cost.

One of these days, OCR software will evolve to accurately handle printed input, and probably less accurately handle handwritten text in a mixed printed/handwritten document. Practically speaking, that time hasn’t yet arrived, certainly not at consumer prices.

Thank you for the wonderfully quick response!

I regret that I may have been unclear in my original description. The text that isn’t being recognized by the OCR is not in fact handwritten but is printed by a thermal or dot matrix printer or similar on top of the pre-printed form. So the document is a combination of pre-printing (e.g., “Ticket Number” heading) and on-the-fly printed output (e.g., “5010032893”). The pre-print is recognized by the OCR. The on-the-fly printed output is not.

Thanks!

Alan

P.S. For what it’s worth, If I scan the same traffic violation ticket into Adobe Acrobat 8 Professional and use its OCR, the result is that Adobe recognizes all of the text in the ticket, both pre-printed and on-the-fly text.

I didn’t try any thermally printed copy with the current release of DTPO2 pb3.

But I’m now running an improved plugin that also solves the problem of black images from some PDFs.

Just scanned rumpled and poor-quality thermal copy and got good OCR accuracy from it.

Sol hold onto your traffic ticket (but either appeal or pay it!) and try again with the next release of DTPO2. :slight_smile:

Cool, thanks.

Would you like a copy of the scanned image to test against your new plug-in? If so, please tell me how best to convey it to you.

Thanks,
Alan

Sure, Alan. Just attach it to a message to Support.