As I updated from 3.0.1 to 3.0.2 DT appeared to automatically install a new version of ABBYY. The image layer in OCR’d PDFs seems significantly degraded.
I hoped that the new OCR engine would produce PDFs with an image layer that is no longer so degraded as I’ve been seeing in 3.0.1. When the imported scan is not of great quality, I usually have to save the document in both OCR and image versions, because the OCR typically produces a PDF that, while searchable, is harder or less pleasant to read. Instead of improvement, there is a decline in quality. I hope there is a straightforward fix.
PDF before OCR:
PDF after OCR in DT 3.0.1:
PDF after OCR from original image-only PDF in DT 3.0.2:
Here are the settings used for both OCR imports: Compress PDF is off. Deskew and Page Orientation are on.
I would prefer a larger PDF that is close to, or identical to, the imported version in legibility, even if the resulting file is larger, as currently I’m saving two copies anyway. The 3.0.1 version is fuzzy but easier to read than the 3.0.2 version. And of course the unprocessed image is much better than both. Thanks for any advice!
Same here. No matter how I set the setting under the OCR tab, to compress or not to compress the PDF, seem like converting document to searchable PDF will compress the PDF. I just want to keep the original PDF with added searchable text.
I just tried to uninstalling Devonthink 3.0.2 and reinstalling 3.0.1 afterwards. Also downloaded and installed the Abbyy Finereader plug-in afterwards, but it’s still the same. I believe the issue is the Abbyy Finereader plugin that’s currently in the cloud, because now 3.0.1 is acting the same as 3.0.2 after upgrade.
Using current version of ScanSnap I have ScanSnap doing the OCR and setup Devon to NOT complete and OCR. Seems to be working well but I don’t have any idea if this scenario changes anything else. Scans are clear and have OCR.
So I found the following post “Abbyy finereader within DT 3”, it shows the location of the DTOCRHelper, that Devon uses to OCR the PDF. According to the new file downloaded and the one I have on my computer, it’s dated November 8th, 2019. Anyone possibly have the DTOCRHelper file prior to this date?
We need to be able to count on the IMAGE to be of higher quality as OCR is what OCR is at best not accurate I hope ERIC and the design team are reading this. When these changes are done that impact our data it would be nice to KNOW or BE ADVISED. I for one do not appreciate DT to make decisions for me especially as it affects MY data. Awaiting your comments DT staff.
I have also reverted to the previous version, but the image is still degraded from the original scan. OCR with PDFPenPro does not affect the image layer, at least that I can detect; however that of course entails an additional step, and the ABBYY OCR is marginally better.
Please ask ABBYY to include an unaltered image as an option, in addition to compression choices.
A lot of OCR software on the Mac sucks big time by not being able to preserve the CCITT T.6. (Group 4) lossless image compression when rewriting a PDF file. This compression format has been the de-facto industry standard for scanning and archiving since multiple decades, but Apple’s PDFKit library framework seems to only be able to decode (= display) such PDFs but apparently cannot write them, and software companies producing OCR or document management for the Mac are either agnostic to this problem (which has been existing for many many years, eg. see this forum post from 2012) or just don’t bother.
I had hoped that with DTP v3 things would have changed for the better, but apparently not.
So one should either not alter PDF files coming from a scanner (all original bundled scanner software that I know does do CCITT Group 4) on a Mac or use Adobe Acrobat. Both alternatives suck.
Finereader Pro output shown. left uncompressed; right low quality.
Do your pdfs look better in Adobe Acrobat Reader? (Preview, and by extension, all renderers using PDFKit prerender using low resolution greyscale. But usually this clears up within a second, if not instantaneously.)