As mentioned elsewhere, I’ve been thinking about how I’d like to handle e-mail in DEVONthink – much of this has been discussed before, but there are some things I’d like to do differently, and I’m using this as a DEVONthink (and automation) learning exercise as well.
In this initial post, I’m going to detail my primary aims, as well as some of the things I’ve discovered/worked out already.
There are some implementation matters I’m still nutting out, and would be interested in others’ thoughts.
I’ll occasionally throw in some half-assed snippets of things I’m working on as I discover interesting aspects of the process.
One of the things I’m certainly appreciating is that no solution is going to be perfect, so there will be some compromises. And compromises I’m prepared to make will not suit everyone, so I’ll never consider anything I do the “ultimate” solution to this issue.
But first, what’s my “problem” with things as they are built-in to DEVONthink?
My problem starts in wanting to do as much as possible in DEVONthink as efficiently as possible – and I understand the irony of wanting to be efficient while proposing to bring in 31 years of e-mails (totalling ~160,000 e-mails).
I’m not looking to replace my mail client (Apple Mail) for day-to-day use, but I’d like DEVONthink to benefit from the context of my archival e-mails when it comes to using the See Also and other categorisation tools.
If it works well, I’ll pull my “archiving window” closer to present day - at the moment, I do a big sweep of mailboxes in early January each year – if I start getting the benefits I’m hoping for, that might be done more frequently.
Also, a heap of the attachments in my e-mails are also stored external to Mail, so there’s doubling up going on there as well as the same attachment/file existing in multiple e-mails, so perhaps using replicants of extracted/imported attachments will be most efficient (noting some context is lost if there’s effectively one resultant file which is tied to only one originating e-mail).
So, because it leaves attachments in the e-mail items, importing attachments as I import e-mails using Settings > Files> Emails > Message Content > Import attachments
is not going to help me out much in the efficiency stakes. It also doesn’t link the attachment to the message it was attached to.
The same considerations apply to importing attachments via the equivalent menu item (not surprisingly).
So I’m left with doing the importing via automation – I’m leaving the triggering (manual, on import, periodic smart rule) as a later decision.
I’ve done automation in several environments (AppleScript, python, Keyboard Maestro, a few others, and combinations of any of those), but I’m certainly not what I’d call an expert. But I seem adequate at working out solutions which suit my use cases. I’d never suggest they’re the most efficient way to achieve what I’m trying to do, but they’re certainly aligned with my aims, and I enjoy the learning and research.
As mentioned elsewhere, I am working in AppleScript for this project, and I’m planning on utilising as much “plain” AppleScript as possible, so no additional languages (maybe some do shell script
calls, but I’m even trying to avoid those), no Keyboard Maestro (except maybe as a manual triggering mechanism), if I can avoid those.
Thankfully, Mail and DEVONthink have pretty extensive AppleScript dictionaries, and I’ve already worked out several components which achieve what I’m aiming for (or the necessary building blocks).
Some of the things I’ve been thinking about are:
- Do I want to delete attachments from e-mails after I import them?
- What’s the best way to link attachments to e-mails (and, potentially, vice versa)?
- How important is it to have
To:
andFrom:
columns for navigation? - I was considering having Read/Replied/Forwarded/Redirected status imported – will I actually benefit from this? [Hint: I’m moving away from bothering, although it would be relatively trivial compared to larger considerations in developing this solution]
- How do I handle multiple attachments to one e-mail with the same name? [They may not have the same data]
- What’s the best split of scripted load vis-à-vis Mail vs DEVONthink?
- And how much communication between them will I need to build in?
For now, let’s just look at some of the issues related to item 1 of that list…
To delete, or not to delete
Optimally, attachments I extract out of e-mails shouldn’t remain in the e-mail data, wasting space. Assuming, for the moment, that I come up with an acceptable e-mail↔︎attachment linking solution, I also need to consider the type of attachments.
I don’t want to just ignore inline attachments – for example, Apple Mail attaches single-page PDFs as inline attachments. Same with bitmap images. But many inline images are just small icons/logos in mail signatures, often duplicated in e-mail threads several times, often changing name as mail clients deal with this digital dross. So while I’ll want to consider importing some inline attachments, I’ll likely want to ignore others (or have to delete the unwanted DEVONthink items if one stage imports all attachments).
I’m expecting I’ll want to import all non-inline attachments as they are usually documents for applications (such as Word, Pages, Excel, or multi-page PDFs).
But if I utilise Apple’s “Remove Attachments” feature, it removes all attachments, both inline and non-inline, and I’ll lose the link between the attachment and the e-mail if I don’t plan carefully. While I just called them digital dross, logos/icons in attachments do maintain the look of the original e-mail, I’d probably prefer to keep those in the e-mail rather than having some “picture missing” rectangles interspersed in e-mails.
So, ideally, I want to import, then delete from the original e-mail:
- all non-inline attachments
- all inline pdfs
- some subset of inline graphics files
So the simple “Remove Attachments” feature in Mail will not provide the nuanced and selective deletion only of attachments I have imported.
Which then brings a major complication – modifying the e-mail source data to remove whichever attachments have passed my tests for importing, but leaving in the ones I’ve not imported.
As pointed out elsewhere, the underlying structure of e-mail messages is complex (primarily defined by MIME, e-mails with attachments are inherently multipart MIME data, but attachments aren’t the only type of multipart data (think HTML vs plain text representations of the body of e-mails). It’s not necessarily straightforward to parse the data to be able to selectively remove some parts and leave the rest.
This is where item 5 of my list also rears its ugly head – an e-mail might have two (or more) attachments with the same name, but one may be below my import size threshold and the other above it – how do I make sure I delete the right MIME part?
And when am I doing the deletion, anyway – before or after importing into DEVONthink? Each one has pros and cons, not only when considering the “optimal storage” aspect of what I’m trying to achieve, but also in the consistency of what DEVONthink actually sees.
@mdbraber has utilised a two-script (and two-language) solution in his excellent solutions to the question of how we handle e-mail attachments in DEVONthink.
Between the two scripts, and the capabilities of the two languages and referenced libraries used within them, attachments which meet the defined criteria are selectively removed from the e-mail data in the DEVONthink item’s source file, and an additional html MIME part is added to that source file with links to the attachments imported into DEVONthink, and the attachments are linked back to the originating e-mail (if I’m understanding what I’ve read without having closely studied the code).
The basic ideas and flows of those solutions have certainly informed my considerations about how I’d like to handle deletion of attachments after importing (and even whether I bother deleting anything/everything), but I think I can do much of the same processing in AppleScript alone.
There is, of course, the chance that doing so will not be as efficient at what @mdbraber’s python script is doing, but as a learning exercise, and as an infrequently used tool (relative to other work within DEVONthink), I’m happy to wear any such additional performance cost.
And as a learning exercise, I’m enjoying playing around with possible alternative solutions to the same issues already solved in @mdbraber’s scripts, even if, in a hand-wavy sort of way, we’re trying to solve effectively the same “problem”.
I’ve been working on some smallish scripted components to work towards implementing whichever path I decide to take, and I still haven’t finalised which components and exact steps on that path I will use. I’ll maybe share some of those components in a followup post in the coming days.
For now, though, let me know if you have your own thoughts on my considerations, especially regarding item 1 – I don’t think any other individual item on my list will need as much consideration or scripting as that one.
Sean