OCR doesn't work correct

Hi everyone,

I just tried to convert some of my scanned documents that I couldn’t convert last week… After the conversion the pdfs are totaly distorted, hardly not readable anymore.

Please help me.

Affter a coupple more tests I noticed, that DevonThink mixes up the lengths with the widths. Which means that a scanned A4 document (30 cm x 20 cm) page is 20cm x 30 after the conversion and the text gets skewed.

Any ideas? Thank you,


Marcus, could you please attach an image-only PDF and the PDF after OCR in a message to Support? (Relatively small ones, if possible.)

I had the same problem with the non-public beta. I had posted a bug report and Annard was looking into it. With PB3 I thought this had been fixed, but it seems not. I sent him the PDF files in before and after versions.

Annard forwarded your PDFs to ABBYY and is awaiting their response.

I have never seen any problems with the image layer of searchable PDFs processed by ABBYY from my two scanners, a ScanSnap and a CanoScan LIDE 500F. The images are sharper and crisper than those produced by IRIS, and are properly aligned.

I have seen several PDFs sent in to Support that produce black screens, either on the first page (e.g., from some JSTOR files, so that removal of the black first page results in a useable document), or with all pages black and unreadable/unusable.

I haven’t seen any PDFs like those described by Marcus, which is why we would like a sample of the image produced by the scanner and the resulting OCRd PDF.

It’s very useful to test such samples. In some cases, the problem seems to be the user’s computer environment. In others, ABBYY as configured in pb3 will consistently produce anomalous results; in the latter case, ABBYY is working with Annard to solve the problem.