Importing mbox files via AppleScript

I’m trying to sync a handful of labels from my Fastmail account to DTP. I don’t want to use a mail client (Mail or Thunderbird) because they mean a lot of fussy overhead and wasted disk space on my tight drive. So I’ve got a command line tool set up to download just the labels I want into mbox files, and it’s set up to run automatically on a regular basis to catch new messages. Simple and effective.

Next step: importing the messages into DTP. If I use the Import > Unix Mailbox command, it works beautifully. However, I really want this fully automated so that my message archive is always ready to go. Thus, I’m trying to use AppleScript. I didn’t see any bundled scripts that did this, and about everything I’m finding in the forum concerns using a mail UA as a critical part of the process. So I’m writing my own.

Now, I’ve been programming for 40-ish years, but I’m terrible with AppleScript; there’s just something about it that I struggle to get my brain around. As a result, I’ve spent probably close to 24 hours banging my head on this. I’ve created a number of variations of my script that got really close to doing what I want but failed in some small way that proved insurmountable.

Here’s where I want to get to:

  1. Import mbox file as a group of email messages inside an existing group. I’ll creatively call this existing target group Imported Email.

  2. The mbox includes a ton of “Outlook-xxxx.png” items from social media images that the messages have in their .sigs. I don’t care about these, so I want to move them to another existing group we’ll call Ignored Email Items. I’m doing this versus trashing them because I don’t want DTP to keep recreating them on each repeat import. (I have DTP set to skip duplicates during mail import, so I also won’t get a bunch of replicants.)

One of the big thorns I keep brushing up against is how to get the mbox file properly imported. There’s an import path ... to ... AS command that seems like it should do the trick, but it simply imports the mbox file as an opaque file, rather than importing it as a group of messages. I’m not sure if this is a bug or if it’s the wrong way to do this.

The only way I’ve found to get the import to work is the open ... AS command. That makes the import happen properly, but it ends up in the global inbox. I can move them from there to my target group in one of the databases by using more AS or a Smart Rule, but this move seems to break the duplicate tracking. I’m not clear as to why, but I suspect it’s because when the import happens, the previous import’s results have been shuttled off to my target group in another database, so DTP isn’t seeing duplicates at that time and thus not skipping those items.

Any suggestions out there how to make this work? I see four possibilities:

  1. Identify a way that open ... can specify a target group rather than just dumping into the inbox. (BTW, I do know you can change the import destination in the settings, but that changes for all imports, which would break a bunch of other things I do.)
  2. Identify a way that import path ... to ... can parse and import the messages inside the mbox file instead of just dumping the mbox file itself into group.
  3. Identify some other AS command to do this.
  4. Identify some entirely different way to do all this that may not even involve AppleScript. For example, I could resort to UI scripting, but that’s super-ugly for obvous reasons. Maybe I need to parse the mbox myself pre-import so DTP can just bring in the email messages. I have ideas, but I feel like I’ve already spent enough time chasing gold on this, so it’s time to ask the pros out there so I can get back to my other work.

I’m all ears to ideas or guidance!

I’d simply read the mbox as a text file and split that. Like so (JavaScript, I don’t do AppleScript). The script must be modified to point to the correct mbox file, database, and group.
It splits the mbox file into messages and creates records for them.

Update The current version fixes most of the shortcomings of the first one.

What it does not yet:

  • ~~Decode binary Base64 headers, notably subject lines
  • Decode anything not UTF-8-encoded (eg Latin-1)~~
  • Does decode quoted-printable Latin 1 and Base64 UTF8, Latin 1, and Windows 1252
  • Setting the URL to the message sender does somehow not work. Or at least it doesn’t do what I expect it to do. Works as expected.
  • Prevent duplicates. That’s more difficult, since createRecordWith does not use the “Message-ID” to create DT’s UUID. My idea to prevent duplicates would be to store the Message-ID in a custom metadata field and check against that. A bit awkward, but feasible. Fixed by checking for the existence of the UUID before adding the record.

The call to createRecordWith is mostly stolen from a script available with DT3. This one is written for DT4.

@cgrunenberg: Is it correct to set the recordType to “unknown” instead of “email”? And is it also correct that the url is set to a longish “mailto:” link, not only the sender? And I suppose that I can’t set the UUID in the createRecordCall similar to what DT does (did?) when it imports emails?

const encodingMapping = {
  'UTF-8' : $.NSUTF8StringEncoding,
  'ISO-8859-1' : $.NSISOLatin1StringEncoding,
  'WINDOWS-CP1252' : $.NSWindowsCP1252StringEncoding,
};

function getHeader(msg, header) { 
  const headerRE = new RegExp(`^${header}: (.*)$`, 'mi');

  const match = msg.match(headerRE);
  if (!match) {
    console.log(`headerRE: ${headerRE} not matched for message ${msg}`)
    return undefined;
  }
  
  // Raw header value, can contain encoding stuff
  let headerValue = match[1];  
  
  if (headerValue.includes('=?')) {
    
    // Get data from the headerValue:
    // Prefix, Charset, encoding type (Q or B) and the encoded string
    const match = headerValue.match(/^(.*?)=\?(.*?)\?([QB])\?(.*)/);
  //  console.log(match);
    if (match) {
      const prefix = match[1]
      const charEncoding = match[2];
      const quoted = match[3] === 'Q';
      
      /* Remove all occurences of the encoding string from the headerValue */
      headerValue = match[4].replaceAll(`=?${charEncoding}?${match[3]}?`,'');
      
      if (quoted) {
        
        // Handle quoted-printable text
        // Remove all coding info, change spaces and underlines to '%20'
        headerValue = headerValue.replaceAll(/[ _]/g,'%20').replaceAll('?=','');
        if (charEncoding.toUpperCase() === 'UTF-8') {
          headerValue = prefix + decodeURIComponent(headerValue.replaceAll("=","%"));
        } else if (charEncoding.toUpperCase() === 'ISO-8859-1') {

          // Use deprecated NSString function for ISO-8859-1 encodings
          headerValue = prefix + $(headerValue.replaceAll("=","%")).
            stringByReplacingPercentEscapesUsingEncoding(encodingMapping['ISO-8859-1']).js;
        }
      } else {
        
        // Base64-encoded header. Only UTF-8, Latin1 and Windows 1252 are supported
        // Add additional charsets to "encodingMapping" at the top
        const encoding = encodingMapping[charEncoding.toUpperCase()];
        if (encoding) {
          const nsdata = $.NSData.alloc.initWithBase64EncodedStringOptions(headerValue.replace('=',''),$.NSDataBase64DecodingIgnoreUnknownCharacters);
          headerValue = prefix + $.NSString.alloc.initWithDataEncoding(nsdata, encoding).js;
        }
      }
    }
  }
  return headerValue;
}

const DT = Application("DEVONthink");
const curApp = Application.currentApplication();
curApp.includeStandardAdditions = true;

const mboxPath = '/Users/ck/Desktop/INBOX/Urlaubsmails.mbox/mbox';
const mboxFile =curApp.openForAccess(mboxPath);
const mboxContent = curApp.read(mboxFile);
const messages = mboxContent.split(/^From /m);
const db = DT.databases["Test"];
const group = DT.createLocation("/email", {in: db.root});
messages.filter(m => m.length > 0).forEach((m,i) => {
  
  // Get header values 
  const subject = getHeader(m, 'Subject') || '(no subject)';
  const sender = getHeader(m, 'From');
  const date = getHeader(m, 'Date');
  const messageID = encodeURIComponent(getHeader(m, 'Message-ID'));

  // Add only new messages to DT, using the message ID
  if (!DT.getRecordWithUuid(messageID)) {
    const newRec = DT.createRecordWith({'record type': 'email', 
      name: subject + '.eml', 
      'creation date': new Date(date), 
      source: m.replace(/^From:*?$/m,''), 
      URL: sender}, 
      {in: group});
   } 
})

That’s still supported but email is now recommended.

Yes.

You can’t set UUIDs but in this case it’s internally handled anyway.

… if one uses record Type: "email". With “unkown”, the UUID is different from the message ID. Which is ok, I guess.

The Add message(s) to DEVONthink still uses unknown and the UUID is set as expected, just checked this.

I’ve updated the script and fixed most of its shortcomings. Tested with a 3 MB mbox file containing 34 messages. Two of those caused DT error messages because it couldn’t recognize the message format. That seems to be due to a bad character-set header in the original message or so.

As to speed: The script takes about a second to process the 34 messages. I don’t know whether that is fast enough or not.