OCR Quality

I did an experiment with OCR’ing a pdf import today. Even using the default settings of 150 dpi and 75% quality,the file doubled (5.5MB --> 11 MB). In contrast, by using Acrobat I could optimize the file while OCR’ing, in which case the resulting file was actually smaller than the original image (2.7 MB) and of higher quality. Is there a setting that I missed or is DT’s OCR processor just not all that hot? If the latter, will this be improving? Going through the extra step of using Acrobat wouldn’t be that big a deal for a once-in-awhile effort, but…And most people don’t have Acrobat.

This has come up many times and there is only so much we can do since we are not PDF experts like Adobe and don’t want to be. That said, the final release allows you to pass BW scans through with no alterations and that may save some of your precious disk space. No guarantees.

Woo hoo! Let’s hope this works. In my testing, this is the biggest source of file bloat & quality degradation.

You are not PDF experts, but Apple’s colorsync utility has compression methods that could be tapped into . I have a filter for colorsync utility that compresses PDF’s for me. See this forum link, where you can get several filters:
discussions.apple.com/thread/12 … 0&tstart=0

It would be great if the OCR preferences could tap into something like this.

I haven’t tried the filters posted by danzac, but by their description, it seems that they still don’t address the issue fundamental to the problem with DTPO & Abbyy, which is that bitmap (1-bit) images get converted to JPEG instead of retaining their bitmap image quality. It is this process that increases the file size & reduces image quality. To date, Adobe Acrobat is still the only program I have found that will perform OCR, add the hidden text layer and downsample the image resolution while retaining its bitmap quality.