Compress/resample graphic after scan/OCR?


is there any way to reduce/resample the background (the original scanned image) after a successfull OCR? To jpg-ize the graphics?

When scanning a document in 300 dpi they become kinda unusable for email, and sluggish to read. One scanned page results in 2-3 MB, this could be reduced to a tenth without any problem, especially since the text is there in the text layer if you need it.

I tried to Print -> Compress PDF before mailing manually, but the file size didn’t change. I know I could open the PDF in photoshop, and fix it myself, but I am lazy… :wink:

Also, the size of the item in DTP is given as 32 kb, but when emailing it, it is suddenly 5 MB…??

A PDF is re-rasterized when it is OCRd. DT Pro Office saves the image layer at 150 dpi as a compromise between file size and reasonably good printing (although some request saving at a higher resolution for printing).

Preview can Save As a PDF to reduced file size, often resulting in significantly smaller PDFs but with loss of print quality.

In your example of a PDF that requires only 32 KB, select that document and open the Info panel. Size refers to the memory space required in your database (essentially, the text content of the PDF); look at the file size, which is the space required on disk to hold the file. For PDFs, file size is often orders of magnitude larger than size.

I personally would prefer if the user could set the downsampling in the preferences. I am not too happy with the resulting quality of the image. I’d rather use other programs to reduce the file size that do a better job of preserving the image.

Annard has noted that it may be possible to provide a user option of “high” or "low " resolution in OCR preferences. “Low” resolution would be the current resolution (150 dpi). “High” resolution would be the resolution provided by the initial release Of DT Pro Office.

When DT Pro Office was first released, one of the most frequent user requests was to lower the file size by reducing the default resolution provided by the IRIS OCR plugin.