I have a DT4 database of about 50GB and over 10,000 PDF files. I want to run all the PDF files through an app such as PDF Squeezer to hopefully substantially reduce the total file size of this database.
Does anyone have experience with a script that would automatically execute something like PDF Squeezer to each file, while staying within DT4, to optimize these PDF docs while not corrupting my DT database?
I use this, although I do it one pdf at a time. If you were to do many pdfs, esp. if they are several MB each, mind you it can take a LONG time - possibly days.
I am sure the script can be inproved
-- Get the selected PDF document in DevonThink
tell application id "DNtp"
set selectedDocs to selection
if (count of selectedDocs) is not 1 then
display dialog "Select one PDF document to compress."
return
end if
set docToCompress to item 1 of selectedDocs
-- Check if the selected item is a PDF
if (type of docToCompress is PDF document) then
-- Get the path of the selected PDF document
set inputPDFPath to path of docToCompress
-- Construct the PDF Squeezer command
set pdfSqueezerCommand to "/usr/local/bin/pdfs " & quoted form of inputPDFPath & " --replace"
-- Execute the command
do shell script pdfSqueezerCommand
-- Display a dialog when compression is complete
display dialog "PDF compression completed."
else
-- If the selected item is not a PDF, show an error message
display dialog "Selected item is not a PDF document."
end if
end tell
I use PDFSqueezer to “squeeze” files before they get imported into DEVONthink. That process is sort of automated with PDF Squeezer’s command line interface. On the initial read of the article I decide the extent of squeezing, e.g. choosing No Images, Strong Compress, Medium Compress, etc. As for files already in DEVONthink, I use the “Open With after selecting one or many PDF’s. Again, inspecting the results and making decisions about the extent of squeezing. Squeezing too far makes the squeezed files hard to read, sometimes.
For how I do things, I don’t see the value in making a script to automate batch squeezing with PDF Squeezer and instead use the “open with…” feature. I know you have a huge number of files and the idea of automation is attractive. But my instinct would be to make a Smart Rule to show all PDF’s, sort by size, and then attack shrinking starting with the biggest ones first. Say select a few dozen or more at a time, then flip to PDF Squeezer to select how much squeezing, squeeze, save, and close. I did that with all the PDF’s that pre-date my current pre-processing. Yes, might take a while for more than 10,000 files, but may protect against ending up with un-readable files.
Also gives opportunity to prune any files now seen as not needed to be retained.