Reducing PDF file sizes but maintaining OCR ability

garyburke · April 29, 2008, 8:12am

Hi there,
When I scan into DTPO (which I haven’t been able to do for a while on account of miscellaneous Apple bug - and boy do I miss it) the resulting PDF files are very large. One 20 page document is 170mb. I realise they need hi resolution for OCR, but some files I download from the web are readable by OCR but are less than 1mb file size.
My database is getting very large and I was wondering if there was an efficient way of reducing my scanned pdfs? If I use either Preview or Acrobat Professional to reduce file size the result is fuzzy an unreadable, even by my own eyes.

I’d appreciate any advice.
Many thanks.

mkilci · May 8, 2021, 4:41pm

Same question. Very large PDFs, I open in Adobe Acrobat, reduce file size and save. If there is a way to automate that, it would be great as well, I haven’t tried automating it…

Blanc · May 8, 2021, 6:06pm

Funny you should ask. This thread in the German subsection of the forum is about automating the use of PDF Squeezer, which would apparently do what you are asking after. The OP set up an automator action/app with the required settings, and I wrote a script to automate sending PDFs to that app. We ran into some trouble, and I haven’t yet received feedback on my latest suggestion. But perhaps you want to tag along with the idea.

Adobe Acrobat is scriptable, so you may also be able to write a script to automate things using Acrobat; I haven’t scripted Acrobat myself, so I’m not going to be able to provide you with a ready script as far as that goes. This script would be a suitable basis though, but you’d need to substitute the tell application Autokomprimierung section with the appropriate code for Acrobat.

BLUEFROG · May 8, 2021, 6:57pm

Please provide information on the settings in Preferences > OCR.

Blanc · May 8, 2021, 7:09pm

I kind of assumed the idea was to reduce the size of PDFs which were not yet OCRd (“maintaining OCR-ability”), but having read the old original post, I’m not so sure. @mkilci if I have misunderstand you then my answer may be of less value; please accept my apologies.