I try to use OCR to convert a image-based PDF into searchable PDF, in settings, I unchecked “compress PDF”, DPI=300. The original file is 14mb, after convert, it’s only 4mb and image inside is a little bit blur.
I don’t need any quality loss, just adding searchable ability, how can I do that?
Can you provide a copy of the original file so that I can test it, and which version of macOS are you running?
Building a Second Brain- The Illustrated Notes.pdf (11.5 MB)
Please check it out. Please forgive me the original file size is around 12MB. My macOS is 11.1.
Thanks for sending the file. It looks like there. are some addition artefacts around the text that could have been introduced during the page extraction. I have added a change that should reduce this.
But how come I think the image quality reduced a little bit after conversion (file size from 12mb → 4mb)? Possible remain the image quality but extract texts?
The difference between the original size which was generated with macOS PDFKit and the OCR’d file is that the ABBYY OCR has a significantly better compression than PDFKit.
That’s not the point here.
As he wrote, he disabled compression
ABBYY will always apply some compression. If the “Compress PDF” option is off this relates to the final PDF size in two ways:
- If metadata is added or transferred from the original file the saved file will not be compressed.
- In ABBYY OCR when generating the PDF, the priority is set for quality over size, however it is ABBYY that determines the actual amount of compression applied to the final file.
As i understood the OP, the visual quality of the PDF was affected.
This seems to point to lossy compression being used.
Disabling compression in the settings should AT LEAST disable any lossy compression!
The issue is not due compression but the extraction of the page image from the original PDF, which as I said earlier has been fixed.
Ah, then I’m sorry - I did not get this point and only noticed that compression is always used.
Not a problem, happy to explain the cause of the issue.