Further degradation of OCR PDF image layer in 3.0.2

Macster · December 1, 2019, 5:37pm

A lot of OCR software on the Mac sucks big time by not being able to preserve the CCITT T.6. (Group 4) lossless image compression when rewriting a PDF file. This compression format has been the de-facto industry standard for scanning and archiving since multiple decades, but Apple’s PDFKit library framework seems to only be able to decode (= display) such PDFs but apparently cannot write them, and software companies producing OCR or document management for the Mac are either agnostic to this problem (which has been existing for many many years, eg. see this forum post from 2012) or just don’t bother.

I had hoped that with DTP v3 things would have changed for the better, but apparently not.

So one should either not alter PDF files coming from a scanner (all original bundled scanner software that I know does do CCITT Group 4) on a Mac or use Adobe Acrobat. Both alternatives suck.