Importing E-mail and Attachments via AppleScript

As mentioned elsewhere, I’ve been thinking about how I’d like to handle e-mail in DEVONthink – much of this has been discussed before, but there are some things I’d like to do differently, and I’m using this as a DEVONthink (and automation) learning exercise as well.

In this initial post, I’m going to detail my primary aims, as well as some of the things I’ve discovered/worked out already.

There are some implementation matters I’m still nutting out, and would be interested in others’ thoughts.

I’ll occasionally throw in some half-assed snippets of things I’m working on as I discover interesting aspects of the process.

One of the things I’m certainly appreciating is that no solution is going to be perfect, so there will be some compromises. And compromises I’m prepared to make will not suit everyone, so I’ll never consider anything I do the “ultimate” solution to this issue.

But first, what’s my “problem” with things as they are built-in to DEVONthink?

My problem starts in wanting to do as much as possible in DEVONthink as efficiently as possible – and I understand the irony of wanting to be efficient while proposing to bring in 31 years of e-mails (totalling ~160,000 e-mails).

I’m not looking to replace my mail client (Apple Mail) for day-to-day use, but I’d like DEVONthink to benefit from the context of my archival e-mails when it comes to using the See Also and other categorisation tools.

If it works well, I’ll pull my “archiving window” closer to present day - at the moment, I do a big sweep of mailboxes in early January each year – if I start getting the benefits I’m hoping for, that might be done more frequently.

Also, a heap of the attachments in my e-mails are also stored external to Mail, so there’s doubling up going on there as well as the same attachment/file existing in multiple e-mails, so perhaps using replicants of extracted/imported attachments will be most efficient (noting some context is lost if there’s effectively one resultant file which is tied to only one originating e-mail).

So, because it leaves attachments in the e-mail items, importing attachments as I import e-mails using Settings > Files> Emails > Message Content > Import attachments is not going to help me out much in the efficiency stakes. It also doesn’t link the attachment to the message it was attached to.

The same considerations apply to importing attachments via the equivalent menu item (not surprisingly).

So I’m left with doing the importing via automation – I’m leaving the triggering (manual, on import, periodic smart rule) as a later decision.

I’ve done automation in several environments (AppleScript, python, Keyboard Maestro, a few others, and combinations of any of those), but I’m certainly not what I’d call an expert. But I seem adequate at working out solutions which suit my use cases. I’d never suggest they’re the most efficient way to achieve what I’m trying to do, but they’re certainly aligned with my aims, and I enjoy the learning and research.

As mentioned elsewhere, I am working in AppleScript for this project, and I’m planning on utilising as much “plain” AppleScript as possible, so no additional languages (maybe some do shell script calls, but I’m even trying to avoid those), no Keyboard Maestro (except maybe as a manual triggering mechanism), if I can avoid those.

Thankfully, Mail and DEVONthink have pretty extensive AppleScript dictionaries, and I’ve already worked out several components which achieve what I’m aiming for (or the necessary building blocks).

Some of the things I’ve been thinking about are:

  1. Do I want to delete attachments from e-mails after I import them?
  2. What’s the best way to link attachments to e-mails (and, potentially, vice versa)?
  3. How important is it to have To: and From: columns for navigation?
  4. I was considering having Read/Replied/Forwarded/Redirected status imported – will I actually benefit from this? [Hint: I’m moving away from bothering, although it would be relatively trivial compared to larger considerations in developing this solution]
  5. How do I handle multiple attachments to one e-mail with the same name? [They may not have the same data]
  6. What’s the best split of scripted load vis-à-vis Mail vs DEVONthink?
  7. And how much communication between them will I need to build in?

For now, let’s just look at some of the issues related to item 1 of that list…

To delete, or not to delete

Optimally, attachments I extract out of e-mails shouldn’t remain in the e-mail data, wasting space. Assuming, for the moment, that I come up with an acceptable e-mail↔︎attachment linking solution, I also need to consider the type of attachments.

I don’t want to just ignore inline attachments – for example, Apple Mail attaches single-page PDFs as inline attachments. Same with bitmap images. But many inline images are just small icons/logos in mail signatures, often duplicated in e-mail threads several times, often changing name as mail clients deal with this digital dross. So while I’ll want to consider importing some inline attachments, I’ll likely want to ignore others (or have to delete the unwanted DEVONthink items if one stage imports all attachments).

I’m expecting I’ll want to import all non-inline attachments as they are usually documents for applications (such as Word, Pages, Excel, or multi-page PDFs).

But if I utilise Apple’s “Remove Attachments” feature, it removes all attachments, both inline and non-inline, and I’ll lose the link between the attachment and the e-mail if I don’t plan carefully. While I just called them digital dross, logos/icons in attachments do maintain the look of the original e-mail, I’d probably prefer to keep those in the e-mail rather than having some “picture missing” rectangles interspersed in e-mails.

So, ideally, I want to import, then delete from the original e-mail:

  • all non-inline attachments
  • all inline pdfs
  • some subset of inline graphics files

So the simple “Remove Attachments” feature in Mail will not provide the nuanced and selective deletion only of attachments I have imported.

Which then brings a major complication – modifying the e-mail source data to remove whichever attachments have passed my tests for importing, but leaving in the ones I’ve not imported.

As pointed out elsewhere, the underlying structure of e-mail messages is complex (primarily defined by MIME, e-mails with attachments are inherently multipart MIME data, but attachments aren’t the only type of multipart data (think HTML vs plain text representations of the body of e-mails). It’s not necessarily straightforward to parse the data to be able to selectively remove some parts and leave the rest.

This is where item 5 of my list also rears its ugly head – an e-mail might have two (or more) attachments with the same name, but one may be below my import size threshold and the other above it – how do I make sure I delete the right MIME part?

And when am I doing the deletion, anyway – before or after importing into DEVONthink? Each one has pros and cons, not only when considering the “optimal storage” aspect of what I’m trying to achieve, but also in the consistency of what DEVONthink actually sees.

@mdbraber has utilised a two-script (and two-language) solution in his excellent solutions to the question of how we handle e-mail attachments in DEVONthink.

Between the two scripts, and the capabilities of the two languages and referenced libraries used within them, attachments which meet the defined criteria are selectively removed from the e-mail data in the DEVONthink item’s source file, and an additional html MIME part is added to that source file with links to the attachments imported into DEVONthink, and the attachments are linked back to the originating e-mail (if I’m understanding what I’ve read without having closely studied the code).

The basic ideas and flows of those solutions have certainly informed my considerations about how I’d like to handle deletion of attachments after importing (and even whether I bother deleting anything/everything), but I think I can do much of the same processing in AppleScript alone.

There is, of course, the chance that doing so will not be as efficient at what @mdbraber’s python script is doing, but as a learning exercise, and as an infrequently used tool (relative to other work within DEVONthink), I’m happy to wear any such additional performance cost.

And as a learning exercise, I’m enjoying playing around with possible alternative solutions to the same issues already solved in @mdbraber’s scripts, even if, in a hand-wavy sort of way, we’re trying to solve effectively the same “problem”.

I’ve been working on some smallish scripted components to work towards implementing whichever path I decide to take, and I still haven’t finalised which components and exact steps on that path I will use. I’ll maybe share some of those components in a followup post in the coming days.

For now, though, let me know if you have your own thoughts on my considerations, especially regarding item 1 – I don’t think any other individual item on my list will need as much consideration or scripting as that one.

Sean

That is a function of Apple’s Mail program, and it is not directly accessible via AppleScript.

That is actually the easy part, imo. Scan for

boundary="some string"

at the beginning of the line. “some string” will be the boundary between the different mail parts. Split the mail at lines beginning with

--some string

What makes it a tad bit more complicated is that e-mails can be hierarchical:

boundary="first boundary"
…
--first boundary
boundary="second boundary"
…
--second boundary
…
--second boundary

--first boundary

First, I’d save the things, each with its own name (append digits or so). Keep the relationship between the original place and the saved attachment. Then later decide what to remove and replace with a link to the saved attachment. If you decide not to delete the attachment from the e-mail, delete the attachment from the disk. If “saving” means “creating a record in DT” or simply “writing to disk” is, imo, a matter of taste.

As to item 1: Since I rarely import e-mails into DT, I have no opinion on that. On an abstract level, avoiding duplicate data and saving space is a good argument to delete the items from the e-mail.

I’d avoid Mail scripting like the plague and simply export the relevant mailbox as an mbox file. Then working on that.
Mail’s scripting support is fairly limited and some of it is buggy. And it does not provide for removal of attachments, for example, which is trivial in the textual message.

I’m still wondering why you’d want to use AppleScript for all that, given that JXA has more advanced string and array handling capabilities (and is closer to Python).

Thanks for your input.

It apparently is via System Events:
tell application "System Events" to click menu item "Remove Attachments" of menu "Message" of menu bar item "Message" of menu bar 1 of process "Mail"

But I haven’t even tried that as I am strongly leaning towards not removing all attachments (see my discussion about leaving smaller inline graphics in place):

See my second Update in the other topic:

Getting to that stage was what got me over the line for starting this topic.

As I’m only explicitly looking for the attachments – boundaries are missing elements found in attachments. I’m considering the parts as a flat structure and ignoring parts which don’t have attachment-relevant information for the moment. It may end up meaning there’s nothing in some hierarchical elements (i.e. no attachments between boundaries, if that makes sense).

I’m leaning towards utilising the strengths of each program in a sort of hybrid approach, and I am leaning towards importing a .mbox file at some stage. At the moment, a workable solution is better than a “most efficient” one, and I’m definitely planning on some gross inefficiencies to reduce my programming tasks (not AppleScript’s fault, I know it could do the tasks I’m going to be avoiding). More details later.

Because it’s what I’m more familiar with and more comfortable with.

Basically, I’m a dilettante, not an expert, and I go where my interests lay and where I feel comfortable.

And I think I’ll be able to get AppleScript to do what I want, despite its limitations – and I will struggle in JXA because of my own limitations.

Sean

Well, that’s UI scripting. Not what I’d call “directly accessible”.

Next item on the list is:

This one has also been taking up considerable space in my brain.

One of the major aspects of DEVONthink I’m looking forward to is linking – both one way and two way.

And once again, a plethora of alternative solutions means some decisions need to be made.

@mdbraber uses an elegant solution of attaching a new MIME part (of type text/html) with DEVONthink links to the removed attachments. I must admit I’ve not delved deeply enough into those scripts to know if the attachments are also linked to the original e-mail DEVONthink item.

So that’s one of the things I considered if I want to replicate in my solution, and for now, I’ve decided not to. It would be something relatively (relative to some other things I’m trying to do) easy enough to implement, so I will keep it in the back of my mind.

And this brings up the general consideration of “Where am I going to be saving these links between DEVONthink items?”

We’ve got e-mails with attachments and we’ve got attachments from e-mails, and we want to store links between them for contextual benefits. And, as per other statements, I’m wanting to delete attachments I’ve imported into DEVONthink from the original e-mail.

E-mails, when imported, are assigned a mailto: URL so you can just launch that URL and be replying with a message including, all things going according to plan, an “in-reply-to” header to be able to tie it to the original e-mail.

Sidenote: this utilises a relatively (I think) unknown feature of mailto: URLs – you can specify things like the Subject, From, other headers…even the Body of the e-mail you are generating from the URL. I’ve used this as a “shortcut” web response generator in the past to show in the Subject (if the sender doesn’t amend it, of course) that the e-mail was in response to clicking a mailto: URL on my website.

Given many messages have multiple attachments, we need a way of tracking the attachments from the e-mail that doesn’t rely on a “single item” field, and we certainly don’t want to overwrite the default assigned mailto: URL for e-mail messages.

So, we can use a solution similar to @mdbraber’s, which appends the links inside the e-mail message, or we could include the links to attachments in an Annotation (which unfortunately [to my mind {when considering reducing clutter}] creates an additional Annotation DEVONthink item).

I suppose we could create a multi-line metadata item, but I suspect the links in that field would not be clickable/launchable as separate entities the way the are as a text/html part of DEVONthink Annotation.

And we wouldn’t be able to otherwise use a single-item URL or Item Link custom metadata field to record the (potentially) multiple links to attachments.

As a diversion for the moment (there’s a reason for this diversion, trust me), interestingly, going the other way we don’t have those constraints (at least initially, see below regarding duplicates/replicants).

Each attachment I’ll be importing will be imported from a single e-mail, so using “single-item” metadata fields, either standard or custom is feasible.

The options are:

  • Use the standard URL field for an imported attachment item to be set to the DEVONthink item link of the original e-mail;
  • Create a custom metadata field of type URL to store the same thing – a DEVONthink item link of the original e-mail;
  • Create a custom metadata field of type Item Link to point directly to another DEVONthink item, in this case, the originating e-mail item.

Upon experimentation, I’ve considered some pros and cons of each of these “link back to source e-mail in DEVONthink” methods. Here are the basics (let me know if I’ve missed anything):

Method Pro/s Con/s
Built-in URL Field It’s built-in, nothing non-standard needed to be configured, and it’s on the record’s base Info Inspector You may wish to use this field for something else – the origin’s website, for example, column view not very user-friendly
Custom URL metadata field Frees up standard URL field Column view not very user-friendly, a custom metadata field needs to be created
Custom Item Link metadata field Column view shows user-friendly item name of originating e-mail A custom metadata field needs to be created

Here’s what column view for those three options look like:
image

The URL/link in all three column types are able to be opened by right-clicking on the URL/Link and choosing “Launch URL”, and opening those links from an attachment’s Inspectors pane is relatively as straightforward between the three options as well, so it comes down to your preference to not create additional non-standard metadata fields vs your preference to display the linked item nicely.

While preparing this post and working through how these fields operate, I’m currently leaning towards the Item Link custom metadata field, but I’ll consider how I might be able to accommodate any of the three based on the user’s preference – I may be able to initiate the target for the link via properties set in my AppleScript on first run (changes to AppleScript properties are maintained between runs, so they can be used as persistent variables for such preferences).

The attachment record’s base URL field is a really close second place contender here because of it’s ready availability and need for less bespoke DEVONthink configuration, so I may well change my mind as I work through all this.

In fact, I’ll probably use both the standard URL field and a custom Item Link metadata field, at least at the start, so I get the best of both worlds (yes, I could get the best of all three worlds, but I don’t want to be super-crazy!).

After considering all this, I’ll now return to the links to the attachments from the original e-mail…

If you recall, I was wondering how best to record links back to the attachments – do I use an appended text/html MIME part? Do I create an Annotation with the links? Do we try and get multi-line text fields to have multiple launchable URLs?

I’ve actually decided I’ll be lazy, and not store those URLs anywhere!

But I’m not going to lose equivalent functionality, I’m just going to rely on a built-in feature of DEVONthink, which maintain these links for me automatically: Incoming Links.

That’s right, by using the Incoming Links section of the Links tab of the Document Inspector for an e-mail which has been linked to by its attachments, DEVONthink is already showing me those attachments I’ve already linked back to the originating e-mail!

Why duplicate effort (and increase script complexity) by trying to create MIME parts or Annotations and tracking which attachment I’m up to to make sure I include them all while looping through all that?

Now, there is one implication of such laziness: No matter the “linking back” method used, if the link in the attachment record pointing back to the originating e-mail is ever amended/deleted, I will lose that connection between e-mail and attachment in the Incoming Links display.

At the moment, I’m considering that an acceptable risk. I also believe should that risk assessment change, I can revise the script to update the base e-mail’s records with one of the “include attachments links” options discussed above.

So that’s where I’m currently at with item 2 of my list.

What do you think? What have I missed? What could I do better?

Sean

Tomato/tomato.

Whatever that is supposed to mean. I’m not a native speaker, so some subtleties might escape me (as others here, btw).
If you are referring to UI scripting: though it is feasible, many people here discourage it. In your example, you use terms like menu bar item "Message". Which is fine for you, because you’re using an English locale. But there’s this to consider:

There are ways around the issue, which make the code even less maintainable and understandable. I don’t know if that has anything to do with vegetables or fruit, though.

I know you’re writing that stuff for yourself. But since you’re talking here at length about your thought processes, it is ok to mention that some approaches are not useful for everybody. IMO.

Perhaps try to write a bit more condensed? I find it difficult to follow your thought processes if they go on and on and on. You want feedback, which might be helped by keeping your audience interested.

3 Likes

Apologies – “tomato/tomato” is a reference to an old song about the different ways of looking at the same thing (in this case, and others in the song, pronunciation of the same word).

I have explicitly said I’m looking at the “Remove Attachments” command as a viable option, so “let’s call the whole thing off” (apologies, another reference to the same song). No need any longer for either of us to get caught up in how, or whether, to initiate that command via AppleScript, or any other way, at this juncture.

Please feel free to check out whenever you feel it suits you. I have already mentioned elsewhere I am a long form writer. If that doesn’t suit you, that’s fine, no judgement. I don’t expect anyone to read what I write. I am not here to cater to the “widest” audience, and my writing career has never been targeted at doing so.

Of course some approaches are not useful for everybody! I acknowledge that several times in relation to my own processes. I do not expect anyone to do as I do, so you are free to ignore what I say, and therefore should not feel any obligation to read what I write, nor respond to it!

Everyone is free to ignore everyone else! So please feel free to ignore me at your earliest convenience.

I get the feeling from various responses here and elsewhere that you may not actually be part of my “audience”. No shame in that on either side. I write as I do, long form or not, often as a stream of consciousness (even if it’s after several hours of labour I’m trying to convey), and I’ll either find an audience or I’ll speak into the void. I am certainly content either way. I can only hope you find such contentedness, as well.

Namaste.

Sean

Moving on, let’s consider some of the basic, and not so basic, attributes of e-mail messages, and how they’re handled by DEVONthink, and if we wish to have different behaviour somehow.

I’ve been playing around, and the To: and From: attributes of e-mails are meant to be covered in DEVONthink via the Recipient and Author metadata attributes. These don’t seem to be displayable as columns, however (and search results don’t match those in Mail). So I could import them into custom metadata fields for completeness.

Also, I don’t think BCC: recipients on sent messages are available for search or display as columns in DEVONthink. So I may wish to import those, as well.

Mail’s AppleScript support allows these fields to be determined for any given e-mail, so it would be possible to capture those and assign them to custom metadata fields, while the default behaviours for To: and From: fields (and, I think, CC:) are maintained.

Mail’s AppleScript support also allows the capturing/reporting of various status indicators of messages, including Unread, Replied, Forwarded and Redirected.

While the completist in me thought of capturing these statuses into custom metadata fields, I am seriously reconsidering the utility of doing so. I’m just not sure it adds anything to my DEVONthink e-mail archival purposes.

It’s certainly not because it would be difficult to capture this info into DEVONthink – in fact, i’ve already worked out how to do so – it’s just that I think there is very little utility in doing so vs what I’m actually trying to achieve from importing e-mail and attachments into DEVONthink.

In my spectrum brain, there’s a constant struggle between what I could do vs what I should do. It’s made even harder, as in this case, when could is not much more effort than not doing something.

I may be complacent, but I’m already seeing how I could revisit this decision and import this metadata later if it ever seems necessary – for now, it doesn’t feel necessary, so I won’t include it in the initial set of script/s.

And this actually, tangentially, brings me to an important realisation I’ve had over the last few days. Nothing, and I mean pretty well nothing I do in relation to my archival e-mails is final in relation to what I import into DEVONthink.

Let’s say I import the e-mails, then decide I want to capture the unread/replied/forwarded/redirected status sometime in the future.

I could either try and overlay those attributes on existing DEVONthink records – or I could just delete the existing e-mail records and re-import them, importing the new attributes at the same time!

Any DEVONthink links to the e-mails will actually still work, as the links (and record UUIDs) are based solely on the unique Message-ID headers of the original records. So, in effect, the same message re-imported is, as far as DEVONthink is concerned, the same record!

Import attachments and link them to the e-mails as per the posts above, then delete the e-mails and re-import them to capture additional metadata? The links from the originally-imported attachments will still point to the newly imported records! Ha!

OK, things aren’t quite as perfect as all that, and here’s why (and here’s why it’s important to consider all parts of a process when working on it)…

If, as I have postulated in other posts, I’m going to import messages, and import some, but not all, attachments, and somehow delete the attachments from the e-mails message records in DEVONthink (all conceivable after what I’ve experimented with), I am, by current estimations, going to be importing, or re-importing, the e-mails without the imported attachments from a .mbox file…which won’t keep the unread, replied, etc. status. Dang!

While this isn’t the reason I’m currently not considering importing these statuses, if I changed my mind it would require maybe having a script to check the status on unmodified messages in Mail to overlay those statuses on the modified messages in DEVONthink…maybe. Or maybe I’ll need to figure out another way of doing so.

Not elegant, but not unthinkable.

The point is, before finalising the script, I’ll want to try and nail down as much as possible, and really think about what I’m trying to achieve.

But that’s where all the fun is, right?

Sean

@stratadata

Are you sure you want to make it this complex?

This script by @cgrunenberg works with DT4 to act on an .eml file by creating a new group and placing the original email and the attachments in that group.

It doesn’t remove the attachments from the original email but that’s either a bug or a feature depending on your perspective - and storage costs are cheap.

Why reinvent the wheel with all the other features you are considering but probably will not use that often?

tell application id "DNtp"
	try
		repeat with theRecord in (selected records whose record type is email)
			set theName to name of theRecord
			set theModificationDate to modification date of theRecord
			set theCreationDate to creation date of theRecord
			set theURL to reference URL of theRecord
			
			set theGroup to create record with {name:theName, type:group, modification date:theModificationDate, creation date:theCreationDate} in (location group of theRecord)
			move record theRecord to theGroup
			
			set importedAttachments to import attachments of record theRecord to theGroup
			
			repeat with importedAttachment in importedAttachments
				set URL of importedAttachment to theURL
			end repeat
		end repeat
	on error msg
		display dialog msg
	end try
end tell
4 Likes

It’s definitely intentional as the command just imports the attachments.

I’d rather not have a group per e-mail for archival e-mails. At the moment, I have a group per year, and I’ll likely put a group within that for all the attachments that I’m extracting for that year.

“Cheap” is not necessarily “affordable” for everyone.

I’m in between jobs, so don’t want to splurge on a new internal HD in my iMac, which I have already upgraded to 2TB, so I’m not looking to upgrade again. It is about 90% full (I work in a lot of virtual environments) and remains pretty consistently so.

I am not interested in using external storage, nor can I afford a NAS (and I’m WiFi only at the moment, so that wouldn’t be ideal, anyway).

So that’s not an option to relieve any squeeze, and so I see a fair bit of benefit in regaining that space if I do import the attachments as their own DEVONthink record.

Because it’s fun to explore and learn, and I’m learning a lot about how DEVONthink stores and refers internally to its records, and that’s a good thing. Keeps me off the streets! :rofl:

For example, while I might use @cgrunenberg’s script as the basis for the DEVONthink elements of what I’m doing, I would change the storage location of the attachments, leave the e-mail where it is, remove small image attachments and may choose to link the remaining attachments back to the e-mail in a metadata field as I describe above.

So while similar to @cgrunenberg’s script, even if I just end up doing that part and not deleting the attachments, it will definitely not be the same script with the same outcomes.

Sean

Which is as I expected :slight_smile:

NOTE - UPDATED with some additional thoughts/learnings, see [UPDATE] sections.

We’re on the home stretch in relation to my list of “things I’ve been thinking about”, maybe we can knock the rest over in this post. Next cab off the rank is:

I’ve literally just now realised this is not actually going to be an issue for me given where I think I’m heading, and the details will be in item 6 in a moment.

Gotta say, I’m pleased I won’t be testing each part for filenames, and if two (or more) parts have the same name whether they’re the same size, or other metadata is different, or comparing doing hashes.

I was not looking forwards to that!

Well, that was quick (“Hooray!” I hear).

Let’s get the list finalised, then I’ll describe my basic script flow.

I’ve settled on the following split:
Mail: Base mail data source, which will be used to create the .mbox containing e-mails stripped of attachments of a certain type, namely, those which are not bitmap images below a certain size. PDFs don’t count as images.
DEVONthink: Will import the attachments directly from the original mail source (I’ve already imported the messages), then delete the imported attachements which are the complement of what I’m going to delete in Mail – so bitmap images below a certain size.

Candidate attachments for deletion in DEVONthink (and therefore left in the generated .mbox) may change beyond just bitmap images below a certain size. I’m going to do a sample import of attachments from a group containing a year’s e-mails and decide which ones I can not bother importing in future. So, for example, I may exclude .ics files.

The two stages will be almost separate, or at least could be run separately, but here’s what I think:

  • In DEVONthink:
    • Import original e-mails into DEVONthink
    • Have a group or search showing the e-mails with attachments
    • My script imports the attachments from those messages
      • deletes the attachments I don’t want to keep as independent DEVONthink items following my ruleset
      • As the attachments are imported and left remaining, they are linked to the originating message through a metadata field (likely the standard URL field, but it can be a custom Item Link metadata field)
      • [UPDATE] Track whether any attachments are deleted or not. If all are deleted in DEVONthink, we don’t need to change the original e-mail data
  • In Mail:
    • [UPDATE]for the imported original e-mails, the MIME parts are reviewed if the message has attachments, and an .mbox is appended with the e-mail data
      • parts with attachments which match to the ruleset are appended to the .mbox
      • parts with attachments which do not match the ruleset are not appended to the .mbox
      • If a message has no deleted parts, don’t append it to the .mbox
    • Once the .mbox is created, we return to DEVONthink
    • Find the message in Mail by Message-ID (best to do this on a “known” folder structure to limit search scope
    • Append mail data to .mbox excluding MIME parts correlated to attachments not deleted in DEVONthink
      • Might consider tracking number deleted, but that just might be too fussy for the times there might be a discrepancy
  • In DEVONthink:
    • Delete the messages with attachments to be removed
      • [UPDATE]Might need to do this as Mail is appending messages to the .mbox (can be done by deleting messages via DEVONthink UUID, which is Message-ID from e-mail headers, an e-mail attribute Mail can discover via AppleScript natively)
    • Import the .mbox with the e-mails with deleted parts to replace the originals with all the attachments
      • [UPDATE] could do this per message, but I think I’d rather do a batch import
      • because the UUID is the same, and the URL for the message is the same, the existing links to the messages will continue to work.

Notes:

  • I still have the original, unmolested e-mails in Mail in case anything goes wrong
  • Which means I still have all the attachments
  • At some point I’ll delete the messages from Mail, probably archiving them somewhere else somehow
  • While it means e-mails with deleted attachments end up being imported twice, the import of the deleted attachments, including those with duplicate names, is handled by DEVONthink, not my script, so I’m not decoding them, or getting Mail to save them somewhere to then import and link separately
  • The Message-ID to UUID link is what will bind the actions in the two applications
  • There’s a pretty basic algorithm to determine the base64 encoded size of a file of any given size, so I don’t have to try match Mail’s list of attachments to the base64-encoded parts when excluding attachments above or below a certain size – for example, for a 50,000 byte file base64 encoded into lines split every 76 characters, the base64-encoded data is 67,545 bytes. In DEVONthink I’m testing based on decoded file size, in Mail, I’m testing based on the encoded size
  • It’s going to be slow, but that doesn’t really bother me – I’ll work backwards chronologically

I still need to build my full list of attachment types to be deleted (or excluded from deletion, depending on which app is doing the deletion).

And make a few choices between alternatives (where to store link back to originating e-mail, for example).

But I’m only dabbling at this point as I’m not in a rush, so results might be some time coming.

Sean

OK! Getting much closer after a very productive day.

I have some try loops to put in, I need to specify the target group for the “imported but not deleted” attachments, do a little refactoring to make things make more sense – but for a selected message, the script is basically working and creating the .mbox data without the kept DEVONthink imported attachments included.

I’ve only been testing on a couple of messages, so certainly more testing to do, but I’m pretty happy and it’s currently <60 lines and uses no scripting additions except Standard Additions.

It also does not have any do shell script calls or rely on any additional scripts or scripting languages.

Oh, and I’ll do some commenting, but it’s pretty straightforward, I believe (my scripts usually need to be so I can understand them!).

I’m getting pretty close to being willing to share with some beta testers (I believe @smiling might be interested), so PM me if you’re interested before I do a broader distribution by posting here.

I’m hosting a (mostly) Apple ][ retrocomputing gathering tomorrow, so might not get much more done till Sunday – but I’m gonna keep plugging away as I can on the above points.

Hope y’all have a great weekend!

Sean

1 Like

To construct the .mbox, I need to prepend the message source with the received date and time as GMT, so I think I’ll need to use Foundation to end up with something like:

use AppleScript version "2.4" -- Yosemite (10.10) or later
use scripting additions
use framework "Foundation"

set tz to current application's NSTimeZone's localTimeZone()

tell application "Mail"
  repeat with currentMessage in (get selection)
    set dateReceived to the date received of currentMessage
    set theOffset to (tz's secondsFromGMTForDate:dateReceived)
    set dateReceivedGMT to dateReceived - theOffset

    -- do some things

  end repeat
end tell

Will implement that and see where I land.

Anyone know how to do the equivalent for any arbitrary date (not just using the current offset, we use DST) without using Foundation?

Google’s not helping so far.

Although I just found this plain AppleScript-only method which requires a do shell script.

What are people’s thoughts on using Foundation?

Either of the above methods will mean I’ll need to let go of my “no external libraries, etc” aim, but I’ll have to be pragmatic if I have to be pragmatic.

Sean

First “full” test completed – all 1185 e-mails with attachments out of 3605 in my 2024 archive processed in 35 minutes, resulting in 777 truncated e-mail messages successfully imported from the generated .mbox.

I’ll be able to skip those which are less than the base64 encoded size of my minimum size to leave in the e-mail, so that’s one improvement to implement.

I want to investigate the progress bar display.

I still haven’t put try loops in (happy it made it all the way through the e-mails.

And I want to put a “firstRun” function in to set minimum size to keep, etc.

But one thing I’ve realised today is that another benefit of extracting the attachments form the e-mails is not just decreased storage if I make duplicate attachments replicants (I’ll probably only do that selectively), but as base64 encoded data takes up 33% more space than the unencoded data, I will save some space, too!

Looking at the relative sizes of things, the imported attachments for those 777 e-mails total 732.2MB – base64 encoding would increase that to 976.3MB, so a saving of almost a quarter of GB of space on one mailbox alone.

The saving would be a little more due to the extra carriage return every 76 (usually) characters of encoded data, but that would make less than 1.5% difference in my calcs.

The .mbox with the truncated e-mail is 74.9MB, and the original untruncated e-mails in DEVONthink report as 1.1GB, so everything’s back-of-the-envelope consistent.

A good few days’ work.

A few loose ends before first pre-release here in the forums, but tomorrow after a little more loose end tying, I’ll be happy to share what I have for a few others to test.

Good night, y’all!

Sean

Script is now working through 2023 attachments.

In relation to my last post:

I won’t, actually, be able to do this – there might be some attachments < 50,000 bytes (or the set minimum) I want to work through the process, some PDFs are quite small, so I’ll still want them imported to DEVONthink and deleted from the e-mail.

I now have a progress bar showing [current e-mail]/[total e-mails] ([elapsed hh:mm:ss: time]/[estimated total hh:mm:ss: time]: [record name]":

image

That estimated time will continue to non-linearly decrease as the script gets to messages with fewer attachments (I’m sorting descending by number of attachments to just select those with attachments).

Maybe I should do estimated remaining time, instead of estimated total time - thoughts?

try loop is now in place to ensure the hide progress indicator runs if there’s an error.

Still on my ToDO list (only major item still there), but I’m still leaving it for later. There’s a

on firstRun()
end firstRun()

stub for now.

I’ve only had one person pout their hand up to look at it pre-release, I’ll likely send to them later today and post first pre-final-release here in the next day or two.

Sean

PS at my retrocomputing gathering yesterday I was working on the script and an attendee asked what I was doing and I started to explain to him, and he’s now at least going to trial DEVONthink as an info management tool! He’s faced many of the same challenges I have over the years, so I was able to speak to his experience and the benefits (and few shortcomings) I’ve already seen so far. I wonder if I could get a Sydney meetup organised?!

2 Likes

Testing on 2022 folder and DEVONthink is throwing an AppleScript timeout, even with an increase to 5 minutes timeout.

Trying to lock into what’s happening, I might add tagging to completed messages (or some other indicator of a DEVONthink record completing) so I can more easily see when it goes wrong.

Sean

1 Like

That’s what happens in the other script

set tags of theRecord to (tags of theRecord) & {replacedTagName}

I also added tags to emails that had attachments, but that didn’t fulfil the criteria (small images)

set tags of theRecord to (tags of theRecord) & {notReplacedTagName}