OCR pdf produces blank pages

kenliles · March 4, 2013, 11:56pm

I have some docs that , replaces the document with all blank pages. It seems to do the work of OCR as I see it working in the Activity Window. When complete, the new OCRed document has the same number of pages all blank. It’s just some documents as most work fine. Any clues on this behavior from anyone?

thanks in advance

kenliles · March 6, 2013, 6:03pm

anybody see this behavior when Converting to Searchable pdf?

Bill_DeVille · March 6, 2013, 6:46pm

I’ve never had that happen.

Please attach to a message to Support an example scanner image file that, when sent to DEVONthink Pro Office for OCR, produces only blank pages. (And add a brief note about the issue you are encountering.)

kenliles · March 6, 2013, 6:56pm

will do - thanks;

kenliles · March 8, 2013, 12:35am

sent support a couple of sample examples; guess these docs are scanned too low a resolution to OCR correctly (I checked and they are 72ppi)- OK i guess, but that’s a pretty odd behavior; might be good for others to know- if it replaces everything with a blank page- it’s likely too low a resolution;

support says- OCR can’t work miracles; I guess I know now that’s what I was asking of it.I must say though, I’ve witnessed OCR at 72ppi doing better than nothing on normal (12 point) text imagery…

here’s praying for miracles…

Bill_DeVille · March 8, 2013, 12:58am

I agree that I’ve sometimes gotten usable OCR accuracy sometimes from a simple image with 12 point text at 72 dpi (from a screen capture). But that’s stretching it. FAX has better resolution than that. If you want good OCR accuracy as the normal result of scanning documents, do the scan at 300 dpi.

Of course, the higher the resolution, the larger the resulting file.

I don’t find it necessary or desirable to save the searchable PDFs resulting from OCR at 300 dpi, or to retain the original scan resolution in the searchable PDF stored in DEVONthink Pro Office. For most of my scanned documents coming from my ScanSnap, I choose settings of 130 dpi and 50% image quality in DEVONthink Pro Office Preferences > OCR (that’s better than FAX view/print quality). The resulting searchable PDFs are usually smaller than the original files produced by the ScanSnap.

kenliles · March 8, 2013, 1:12am

yeah- good point and that works well for me too.
When I scan via ScanSnap I use something like 150ppi works great.

The docs I’m struggling with come from someone else’s scan and I can’t control them.

It’s OK, now that I know, I just name them with long names that have descriptive text - support did offer that I could associate textual description via Data- Annotation, good suggestion that might help from time-to-time

It was a little disingenuous of support to suggest that level would be miracle stuff- but at least I know how to proceed

thanks for your help Bill

(love the product in general if anybody’s wondering!)