Ah, Houston, we’ve had a problem.
Jim Lovell, Mission Commander, Apollo 13
Well, not really a problem, but an unexpected mismatch between DEVONthink and MIME types.
Well, OK, it shouldn’t have been unexpected, but the fun is in the journey, right? Right?
So, I was working through the script yesterday and I realised I tell DEVONthink to delete images which have been imported via import attachments of record
which are < minPictureSize
, via this test:
if (the record type of currentAttachment) is picture and (the size of currentAttachment) < minPictureSize then
delete record currentAttachment
else
...
end if
but when it comes to including MIME parts of Content-Type: image/
(MIME parts were being included because imported attachments were being deleted) less than that size (when base64 encoded), I used this logic:
if ((text item 1 of currentPart contains "Content-Type: image/")) and (text item 1 of currentPart contains "Content-Disposition: inline")
if currentPartSize < minMIMEEncodedSize then
-- If so, write that small encoded data into our mail source variable
set messageMboxText to messageMboxText & currentPart & "\n" & partDelimiter
end if
end if
This raised a couple of issues:
- The file formats (graphics formats) included in DEVONthink records where
record type is picture
does not align with MIME parts of Content-Type: image/
, so I can’t be assured I’m always deleting the right attachments and/or including the right MIME parts for any given message
- I wasn’t testing that attachments imported via
import attachments of record
which were < minPictureSize
were also inline before deletion (I suspect I can’t even locate inline attachments after they’ve been imported via import attachments of record
), further increasing the discrepancy between deleted imported attachments and kept MIME types already present from mismatched sets of file types.
I suspect these issues are insurmountable without decoding each MIME part into its original file data so I can test that decoded data before importing selected attachments.
I am most definitely not going to import decode each MIME part into its original file data so I can test that decoded data before importing selected attachments.
So, slight change in plans.
I will now not delete attachments imported via import attachments of record
where record type is picture
and their size is < minPictureSize
.
I still won’t delete MIME parts which are of Content-Type: image/
and of Content-Disposition: inline
if the currentPartSize < minMIMEEncodedSize
and I will still delete all other attachment MIME parts (except those of Content-Type: text/
). So, in fact, I don’t need to re-process the message sources to “correct” this “mistake” (see below)
This allows (most) signature images to still display in the message, and I may or may not manually delete those at a later stage via a Smart Group like:
I can narrow it down further by only including .jpg, .jpeg, .png, and .gif files (the most likely graphics file types used in signatures).
For now, the implications:
- The script actually completed 2010 in about half the time (1h06m vs 2h00m), including recreating message sources
- I’ll need to re-import and re-process all the 2002+ year archives (except 2010) – however, I don’t need to recreate the message source, I just need to re-import the picture attachments via
import attachments of record
and keep those < 50,000 bytes, so processing will be much quicker than prior runs, but importing the mailboxes will still be some level of pain
- For 2010, an extra 1,000 attachments were not deleted, and 15MB of extra space is used before any manual processing – so I’ll gain less space back than the above table implies (maybe < 5%, going by 2010 figures)
- There’ll be extra manual processing time after running the script if I want to get rid of obviously-signature-related images in DEVONthink
So my plan is to finish checking over the script for any other (now) obvious issues or improvements while I re-import the mailboxes, then run the re-processing and see where I’m at and post results and the next iteration of the script here.
The adventure continues!
Sean