DEVONthink make already OCRed pdfs several times larger

Maybe that can be scripted?

2 Likes

That would be great @chrillek !
I am new to DEVONthink and have not worked with scripts, but that would solve my issue.

I don’t have Adobe Acrobat. There’s online documentation available at Adobe’s site and a quite old example elsewhere. I’m sure Google will provide more information on that.
Update: There’s also a thread here.

1 Like

This is background information (or background noise). It won’t solve the problem, but it’s interesting.

You can create a low quality profile in the ColorSync utility.

Open the PDF in Preview and then choose “File->export” (not export to PDF, just export).

Set the export format to PDF and choose your low quality ColorSync filter in the “quartz filter” field.

That will reduce the size of a bloated PDF some, but won’t get all the worms back in the small economy size can.

I keep wanting to look closer with Python’s PDF library, but between training, a panicked rush to find a job, and hipster sloth, I haven’t gotten to it yet.

2 Likes

It’d be good if you could provide an example of such a PDF

Hi @Silverstone ,
Thanks for looking into this! I have uploaded two versions of a pdf; both compressed and OCRed in Adobe Acrobat, but only one opened in DEVONthink.

Mileti et al. (1975) after annotation in DEVONthink.pdf (14.3 MB) Mileti et al. (1975) compressed in Adobe Acrobat.pdf (2.0 MB)

1 Like

I confirm that after annotating (changing) PDF and saving in DT - PDF gets 15 Mb from 2 Mb.

The best workaround I see - is do not do any changes in DT - do them in PDF Expert e.g. or Acrobat. After saving there file remains 2 Mb. In DT you may change any metadata, including custom. But do not touch PDF’s own metadata via DT

3 Likes

Thanks for checking! Perhaps this is something for the development team to look into, since other programmes (e.g. Preview and PDF Expert) seems to be able to maintain the file size. Just as @BLUEFROG suggested. I keep my fingers crossed that it is possible to solve for DT, as that would simplify my workflow significantly.

1 Like