OCR Settings: Compress PDF, Deskew, Page Orientation

Thanks for the explanation.

I meanwhile did some tests:

Sent documents scanned with a ScanSnap ix500 as PDFs to DT3 and did OCR within DT3 with and without compression enabled.
The resulting OCRed PDF was always significantly smaller than the original PDF even though the OCR text layer has been added. Thus DT3 is always recompressing the picture layer in the PDF during OCR. I can clearly see more JPEG compression artefacts in the OCRed PDF.
The resulting file size is about 29% with compression enabled and 38% with compression disabled of the original file size.
The reduction in file size is less if the original PDF was saved with more compression within the ScanSnap Home app. Seems DT3 then has to deal with more JPEG compression artefacts in the original file which hampers further recompression.

As a result of my test, I’ve now changed my settings in ScanSnap Home for picture quality from auto to best and for compression from medium to low and disabled OCR. This generates a significantly larger PDF intermediary in the ScanSnap Home app, which is automatically sent to DT3. Because of the recompression during OCR in DT3 the file size of the OCRed PDF is still smaller than the PDF I get with a lower quality scan and OCR done in ScanSnap Home. The picture layer within the PDF is sharper as well. As a result better quality with smaller files in compare to doing the OCR in ScanSnap Home.

While I do not see much difference in the precision of the OCR between DT3 and ScanSnap Home, text blocks are easier to select in the PDF OCRed in DT3. In the PDF OCRed in ScanSnap Home I often get text selected all over the page not related to the area I drag the mouse cursor over. One additional reason I prefer to do the OCR in DT3.

So while I found a workflow that fits my needs quite well the whole OCR recompression process in DT3 is a bit of black magic and not easy to understand. Personally I’d prefer to have a setting in DT3 where I can set the recompression rate applied during OCR in maybe 3 steps with the choice of no recompression at all in case I have fine tuned the quality in the scanning app already. But I’m afraid this might be beyond your control and the Abby OCR engine you’ve licensed might handle this in a closed environment.

1 Like