I don’t know how long DEVONthink has been doing this, but I’ve noticed this recently:
When I make any change to a PDF file—for example, adding a comment, adding a title to the table of contents, or anything even seemingly small—the file size at least triples. If the file size is 3MB, it now increases to about 9MB.
Is there a diagnosis of the cause of this behavior? And what’s the solution?
DEVONthink uses Apple’s PDFKit framework for handling PDF documents. When you edit a PDF and save it, the framework might completely rebuild the PDF and e.g. re- or decompress images. This also depends on the nature of the change and the document, e.g. just annotating a PDF document with a valid text layer shouldn’t cause this.
However, in practice, this makes it almost impossible to use DEVONthink to annotate PDFs. For example, if I work on files with a total size of approximately 3 GB in a year, I’ll find that they’ve grown to 9 GB. This is a significant number that drains my Mac’s resources, so what if I’ve been working on it for many years?
Since the issue is related to Apple’s PDFKit framework, I strongly suggest finding an alternative solution to relying on it as soon as possible.
For what it’s worth, I use the app “PDF Squeezer” to, yea, “squeeze” PDF file sizes. These PDF’s usually created by Apple’s PDF services before hitting DEVONthink, and sometimes are large. Works well for me.
And that solution would look like what? Forcing/convincing Apple to get their act together? That’s as likely as establishing world peace. They’re focusing on AI now, and their attention span is that of a three-year-old. So, they’ll be focusing on something else in two years time, and PDFKit, as well as Automator, AppleScript, Aperture, and a bunch of other products, will be rotting away like their AI.
Using another framework? That will drive DT’s price up, since it has to be licensed. It must be integrated, too, which takes time and effort away from other programming tasks. And, as we’re talking software here, it will have new bugs, shortcomings, and whatnot. Which you will not know in advance, contrary to many bugs in venerable PDFKit.
Finally, 6 GB more in one year amounts to about 120 GB more if you use your machine for 20 years (do you?). If that really exceeds your Mac’s hard drive capacity, you could put your database on an external drive.
This is not prescriptive, only descriptive, e.g., not all PDFs are the same therefore you can’t reliably assume changes will triple the size. PDFs come in a dizzying array of forms from over the years, many with very aggressive compression. When a PDF is resaved, it uses the compression algorithms of the framework in use. This can lead to the compression level being less (or even more) than the original’s.
I hadn’t noticed the issue, but I’ve not been annotating that much. Just did a test and saw only a little increase in size (about 18%). Another workaround for you might be just to open the file in Preview and immediately export it with the Quartz filter option set to reduce size. Same annotated file I tested reduced by a factor of 16x. May even be a way to automate the process with a smart rule or something.