OCR file sizes in DT3

So far I’ve been really impressed with the speed, quality and file size of OCR in DT3. However, that’s not maintained once I start merging files or manipulating pages, so I’d welcome any comments or suggestions. For the record, I’m working mainly with 300 ppi B&W scans.

Background: I’m a long-time user of DTPO2 to OCR scans from my Brother DCP-7045N. Because this is not a duplex scanner I do a lot of merging within DT, as well as scripted page manipulations in PDFpen. The thing that binds it all together is Apago’s excellent PDF Shrink app, which I have integrated into DT via AppleScript. Any bloat introduced by moving pages around is easily fixed with PDF Shrink. The end result in DTPO2 was good-quality files of sensible size.

Enter DT3 and OCR seems to yield smaller files than before, with good optical quality. The problem is that if I merge files or delete pages, the file size balloons. PDF Shrink can fix this up to a point, but this time with a significant loss in optical quality, to the point where it’s not a workable solution.

I know that OCR and PDF manipulation more generally must be very complicated under the hood, and everyone seems to have their own workflow. But DT3 is increasing in power, as witnessed by Smart Rules starting to replace Hazel and even simple things like the “reverse pages” command (which I used to have to do in PDFpen). It would be nice not to have to depend on so many helper apps. So is it reasonable to hope that a future release will do a better job of keeping file sizes down?

I’ve experienced something similar. Annotating a PDF disproportionately increases the file’s size, e.g. highlighting a few words results in an enlargement from 4,2 MB to 14,8 MB.

Yes! Same here! Deleting pages is increasing file size. But only if a OCR was done with Devonthink before, which will shrink the file. After manipulating die PDF, the size is increasing.

If i delete die original PDF, file size is decreasing as expected.

Just to put this topic to rest, the problem of ballooning PDF file sizes seems to have been fixed in subsequent betas and now in the final release. Adding and deleting pages within DT3 now seems to yield files of the size you’d expect based on their original sizes.

Adding markup to PDFs still increases file sizes quite a bit, but I think this is a problem with any PDF software.