Importing E-mail and Attachments via AppleScript

Another possibly silly follow up question, is there any easy way to set up an auto forwarding rule to forward all my email to something that DT could receive, file, and store for me? It seems like the only options for import are limited or difficult. But having a way simply to establish an auto forwarder would make it so much easier to intake all those messages if DT could receive them.

Oh, cool - I hadn’t even looked at one of those links recently, using the Message ID, which equals the UUID in DEVONthink, makes such linking super easy.

It’s annoying you can’t use it to select or refer to an e-mail in AppleScript, though, it’ll open the e-mail, and then you’d have to look for message details from that then close the window. I may test that at some stage, who knows, it may be faster than what I’m doing, and it certainly removes one selection task for the operator (or may be a reliable way to refer to an e-mail from within an automatically-triggered script).

Sean

Thanks for those.

Current version of the script automatically generates an import .mbox filename from the group which is being scanned, and creates a child group (or utilises an existing child group) of that group named [groupname] Attachments (if there is more than 1 such group, a group selector is shown).

So now it’s just choose Mail mailbox where the originals are, and choose a DEVONthink group to scan.

Wish I could find e-mails in Mail directly by UUID – next up is testing how slow that might be if it’s at all possible via opening an e-mail’s URL in Mail, grabbing attributes from that window, then closing.

Then testing doing date time to GMT via do shell script to compare speed to the Foundation framework method.

After that, I start poring over it to see if I can find some speed ups/rationalisations

Sean

I feel like this is going to be a dead end (except for the solution of just trawling through all mailboxes and returning the first one with a match on the Message-ID).

The document attribute of a window which is displaying an “opened” e-mail is missing value. The name is the Subject and enclosing mail folder name, but that’s not good enough to be useful, IMO.

Moving on…

Sean

Apple’s scripting dictionary exposes an message id for a message. As you’d explained to me before, that should be the UUID used by DT. So a nicely crafted whose expression might be able to find a message by UUID.

As it stands, a message belongs to a mailbox. Consequently, you’d have to loop over all accounts, then their mailboxes and run a whose on their messages to find the corresponding message. That’s certainly a PITA.

See this: referencing a message with only "message id" in "Mail" - AppleScript | Mac OS X - MacScripter

Sample code in JavaScript:

(() => {
  const dtURL = 'x-devonthink-item://%3CCE12711E-E84B-4EC8-9E82-632A7E67D5EB%40example.com%3E';
  const messageID = decodeURIComponent(dtURL.replace(/^.*:\/\//,'')).replaceAll(/[<>]/g,'');
  console.log(messageID);
  const mailApp = Application("Mail");
  mailApp.accounts().forEach(a => a.mailboxes().forEach(m => {
    const result = m.messages.whose({messageId: messageID});
    if (result.length) {
      console.log(result[0].source());
    }
  })
  )
})()

The example from MacScripter runs a loop also over all mailboxes not in any account. I am not sure if that’s a necessary step here.
Note: The “raw” UUID from DT has to be massaged so that it matches the message id of Apple’s Mail (which is not the original message ID in the mail header!):

  • First, DT’s UUID must be URL-decoded;
  • Then the leading x-devonthink-item:// must be removed
  • Finally, the enclosing <> must be removed.

(not necessarily in this sequence). All that happens in the line above:

const messageID = decodeURIComponent(dtURL.replace(/^.*:\/\//,'')).replaceAll(/[<>]/g,'');
1 Like

Thanks, I was aware of these issues, hence my more recent message:

I had been hoping if Apple Mail quickly opens a message from the URL that’s built from the Message-ID (so very similar to DEVONthink’s URL), that I might be able to use that as a shortcut to determine attributes/properties from the message (and avoid needing to specify the mailbox in the script) – that doesn’t work (as I’m sure has been determined and discussed previously) :frowning:

It is a necessary step if you have local mailboxes – account mailboxes are those such as IMAP online mailboxes, local mailboxes (those without an account attribute when scripting) are not stored online.

I used to use local mailboxes for my archives because of storage limits on prior IMAP servers back in the day, but they were local to the Mac only. Since I have so much iCloud space, I now store them on the iCloud IMAP server, not locally, and my other devices (and iCloud webmail) can see them.

Sean

Foundation framework is much quicker, over 100 translations effectively zero overhead, the do shell script method added 44 seconds overhead for 100 translations.

So I stick with Foundation framework for that.

Sean

Searching by message ID is more difficult because the same message can be stored in several mailboxes. Although the message ID should be unique, that only refers to its origin, i.e. the sending server. Once it has arrived, the user can copy it to different mailboxes as often as they wish. Perhaps there are clients that track messages by their ID internally, but I doubt that (not very useful, I guess). Mail definitely doesn’t do it.

And even IMAP doesn’t offer a server-wide search – it always refers to a mailbox: RFC 3501 - INTERNET MESSAGE ACCESS PROTOCOL - VERSION 4rev1

Just curious what happens when a message is copied to another mailbox and its attachments are removed there (manually). When searching for the UUID turns up this message as the first matching one, the attachments are gone…

Not surprisingly, as doShellScript requires spawning a process. Which takes more time than a procedure call. And a reason why I try to avoid that command whenever I can.

1 Like

Unless you ask for all messages whose message id is <xyz> I guess it will return whichever is the first matching message.

Mail definitely doesn’t preclude duplicate messages the way DEVONthink does, but I have imported messages with removed attachments into DEVONthink’s Global inbox when the original full message is still in my e-mail archive database, but I’d have to delete the original full message before moving (or importing) the truncated message into the e-mail archive database so it takes over the identity of that message ID, otherwise it won’t be imported, or it will import as a replicant of the original message (depending on settings).

Sean

As I understand it, whose always returns a list of all matches. At least it does in JXA.

Next edge case – attachments which have not, for whatever reason, been loaded/downloaded into Mail.

155 of them in 2015-2017 for me.

DEVONthink then complains about file errors, and won’t validate/repair the database.

Not too big a deal, a proportion of those (I don’t know how many yet) will have already existing copies in my main database (as I created them or saved them when I received them), or duplicates in the mail archive database.

So I’m just plowing on for now before I decide if I explicitly do something (yet to be determined) with “empty” attachments.

I’ll have more funky edge cases the older I get as I encounter messages I imported (in some long forgotten way) from TeleFinder BBS, while hopefully pine and Eudora will be better from .mbox imports (although I suspect the Eudora ones may well have had the attachments stripped from the mail source already).

Maybe I’ll bite the bullet and start approaching 2014 (the one currently being processed) from “the other side” and see how bad the situation is :slight_smile:

Sean

Forgot the mention, such “missing” attachments are zero-byte files when I use the import attachments from, and that seems to upset DEVONthink a little.

Anyway, after reading @chrillek’s thread about doing a rejig of @mdbraber’s “all in DEVONthink using RTF” method, and their thread about request for access to the source of an e-mail record through scripting, I’m testing doing the latter instead of getting the source from Mail.

Biggest advantages are that:

  • I’m only ever in one application
    • Which means I don’t need to select, or guess or reference, any given mailbox to find then work on the e-mail’s source
  • I won’t need Foundation framework to inherit the “figure out GMT offset for any date” procedure call for the e-mail’s first line I was recreating (as it’s not available in Mail’s AppleScript dictionary, but is in the DEVONthink e-mail record source), so I’m only ever in plain AppleScript
  • I don’t need to test whether a message from Mail has attachments or not (I was doing that as a double check in Mail, possibly superfluously), as I’m working on the record that I’ve already tested that for (and only on messages which have attachments being kept in DEVONthink).

Just finalising the rejig, will run it against the most recent mail folder which is currently processing the original way and compare timing - I’m expecting it to be noticeably faster, and it is certainly a much shorter script.

Maybe I’ll have time to include some tagging, I’ve just been laying off that for the moment to not have to learn “yet another thing” right now.

Sean

Definitely faster (maybe twice as fast – still not “speedy”, but that’s fine, as it will rarely be run against this many messages).

Took the shortcut of just assigning a AttachmentsDeleted tag as I process so I can easily find and delete them before importing the truncated messages in the .mbox – but then I thought “Oh, I haven’t tagged all the other messages I’ve already processed!” :frowning:

DEVONthink developers to the rescue again – I can create a Smart Group which finds all e-mail in the e-mail archive with incoming links

Or I can I can display that as a column in list view…or I can work on it in AppleScript, etc.

So many ways for me to identify the already processed e-mails where attachments are deleted in the .mbox as the kept messages in DEVONthink have links pointing to the messages, which DEVONthink is sensibly counting as incoming links.

Thanks (again) to the developers for a great product with a deep feature set!

Sean

In what respect?

I cancelled an import at one point, then closed DEVONthink and opened it again.

When I did that it reported a database verification failure, and that I should do a repair.

I tried that, and it complained:

I try Repair, and it reports:

Log shows error messages like:

image

Looking at that type of file, I find these:

image

Which look like this in e-mail view

image

And if I deleted the files it stopped complaining.

It’s not what I consider a bug in DEVONthink – except maybe it shouldn’t import empty attachments? Or shouldn’t keep empty imported attachments?

Sean

Usually DEVONthink imports every file unmodified, even empty ones. But the verification considers certain empty files to be most likely broken (e.g. empty PDF documents).

Would it be possible to delete those files imported by DEVONthink which it considers as broken?

Or for verification to actually state what the error is.

With no error message, it’s a little hard to know something has gone “wrong”.

For those who don’t have the Log showing all the time, it can seem unsettling, and I don’t think it should be a requirement to always show the Log.

But maybe that’s just me.

Thanks.

Sean

The error is shown in the Log for each file (as the issue might vary). And via the contextual menu it’s also possible to trash them.

1 Like

Here’s the current version, hoping to rationalise a few things over the weekend, maybe create some handlers, check the comments after recent changes, but otherwise it likely won’t change much more until I finish processing my remaining mail folders (1990s and 2000s) and deal with anything weird those years throw in my way.

use AppleScript version "2.4" -- Yosemite (10.10) or later
use scripting additions

-- Define size of inline attachments above which we keep the imported attachments and delete the MIME part
set minPictureSize to 50000 -- put in scope to round up to next divisible by 4 numnber if changed

-- Figure out the base64-encoded size of minPictureSize – we'll add line breaks later (technically, line lengths won't always be 76)
set minbase64size to (round (minPictureSize / 3) rounding up) * 4

-- Let's set up our MIME part delimiting stuff
set partDelimiter to "\n--"

set currentRecordNum to 0

tell application id "DNtp"
	activate
	
	-- Let's set the group for messages
	set emailsGroup to display group selector "Select a group with e-mails:" buttons {"Cancel", "OK"}
	set emailsGroupName to the name of emailsGroup
	
	--set .mbox name based on emailGroupName and delete existing file
	set mboxFileName to emailsGroupName & " Import.mbox"
	set mboxFile to (((path to desktop folder) as string) & mboxFileName)
	
	try
		tell application "Finder" to delete file mboxFile
	end try
	
	-- Let's set the group for attachments
	set attachmentsGroupName to emailsGroupName & " Attachments"
	
	--*2025-08-01 This is still not working for values ≠ 0
	if (count (children of emailsGroup whose name is attachmentsGroupName)) = 0 then
		set attachmentsGroup to create record with {name:attachmentsGroupName, type:group} in emailsGroup
	else if (count (children of emailsGroup whose name is attachmentsGroupName)) = 1 then
		set attachmentsGroup to (child of emailsGroup whose name is attachmentsGroupName)
	else
		set attachmentsGroup to display group selector "Select a group for attachments:" buttons {"Cancel", "OK"}
	end if
	--/2025-08-01
	
	-- Set up some variables for the progress indicator
	set countRecords to (count children of emailsGroup)
	set startTime to current date
	
	set elapsedTimeHMS to ""
	
	-- Let's go!
	try
		-- Where are we up to?
		show progress indicator "Separating attachments…" steps (countRecords)
		
		-- Go through the group with the e-mails
		repeat with currentItem in (children of emailsGroup)
			
			--Which record are we at?
			set currentRecordNum to currentRecordNum + 1
			
			-- Some more progress indicator stuff once we've passsed one record processed
			if currentRecordNum > 1 then
				set elapsedTime to round ((current date) - startTime)
				set perItemTime to round (elapsedTime / (currentRecordNum - 1))
				set totalEstTime to elapsedTime + perItemTime * (countRecords - currentRecordNum)
				set elapsedTimeHMS to my secondsToHMS(elapsedTime)
				--set elapsedTimeHMS to secondsToHMS from elapsedTime
				set totalEstTimeHMS to my secondsToHMS(totalEstTime)
				set timeProgressString to " (" & elapsedTimeHMS & "/" & totalEstTimeHMS & ")"
			else -- or for first record
				set timeProgressString to ""
			end if
			-- And show the calculated progress numbers and current record name
			step progress indicator (currentRecordNum as string) & "/" & (countRecords as string) & timeProgressString & ": " & name of currentItem
			
			-- Only process records with attachments
			if attachment count of currentItem > 0 then
				
				-- new message, nothing deleted yet
				set attachmentsDeleted to false
				
				-- We need the DEVONthink item link because we'll put that into the URL field of the imported attachments we're keeping in DEVONthink
				set recordRefURL to the reference URL of currentItem
				
				-- Now we import the attachments and hold onto a reference to them all
				set importedAttachments to import attachments of record currentItem to attachmentsGroup
				
				-- Looping through those attachments, we'll delete pictures smaller than 50,000 bytes, and set the URL to the e-mail for everything else
				repeat with currentAttachment in importedAttachments
					if (the record type of currentAttachment) as string = "picture" and (the size of currentAttachment) < minPictureSize then
						delete record currentAttachment
					else
						set the URL of currentAttachment to recordRefURL
						
						-- we also give the tag for this item on whether to do anything with the source data )any time we keep an attachment in DEVONthink we want to delete it from the mail source and save that modified source in the mbox file
						set attachmentsDeleted to true
						set tags of currentItem to {"AttachmentsDeleted"}
					end if
				end repeat
				
				-- Once attachments processed, if any were kept, let's work on the e-mail for inclusion in the .mbox file
				if attachmentsDeleted then
					set recordPath to the path of currentItem
					set messageMboxText to ""
					set messageSource to (read POSIX file recordPath)
					
					-- We do this so we can split the message by MIME parts (each text item is a MIME part)
					set AppleScript's text item delimiters to partDelimiter
					set messageParts to every text item of messageSource
					set countMessageParts to count of messageParts
					
					-- Grab everything up to the actual first MIME part
					set messagePreamble to item 1 of messageParts
					
					-- Grab the last MIME part (which will be a boundary ending spec)
					set messagePostamble to item countMessageParts of messageParts
					
					-- Start the .mbox data with the Preamble
					set messageMboxText to messageMboxText & messagePreamble & partDelimiter
					
					-- Loop through everything except the first and last parts
					repeat with partNumber from 2 to (countMessageParts - 1)
						-- Now we inspect and act on each part as necessary
						set currentPart to item partNumber of messageParts
						
						-- Attachments will always have the encoded part after two newlines, so now we split the currentPart using two newlines
						-- ** This may be able to be swapped around with the Content-Type tests following
						set AppleScript's text item delimiters to "\n\n"
						
						-- Is there more than one section to this MIME part separated by two newlines?
						-- Let's do something if there is
						if (count text items of currentPart) > 1 then
							-- Let's test for the inline Content-Types we want to consider actions for
							
							-- Firstly inline images
							if ((text item 1 of currentPart contains "Content-Type: image/")) and (text item 1 of currentPart contains "Content-Disposition: inline") then
								-- grab the base64 data
								
								-- This is for when inline or image parts don't have a second interpretable part 2025-07-28
								try
									set encodedData to text item 2 of currentPart
									
									-- Check the line length of the base64 encoded data
									set b64LineLength to count characters of (first paragraph of encodedData)
									
									-- Add the base minimum base64 encoded size to the appropriate number of newline characters
									set minMIMEEncodedSize to minbase64size + (round (minbase64size / b64LineLength))
									
									-- Grab the size of the attachment's encoded data
									set currentPartSize to count characters of encodedData
									
									-- Is the encoded data smaller than our minimum size?
									if currentPartSize < minMIMEEncodedSize then
										-- If so, write that small encoded data into our .mbox variable
										set messageMboxText to messageMboxText & currentPart & partDelimiter
									end if
								on error
									set messageMboxText to messageMboxText & currentPart & partDelimiter
								end try
								-- For non-inline images, or application/ Content-Types
							else if ((text item 1 of currentPart contains "Content-Type: image/") or (text item 1 of currentPart contains "Content-Type: application/")) then
								-- Do nothing - i.e. don't add them to the .mbox variable
								
								-- And for anything that doesn't have two newlines while also not matching the above Content-Type and size tests and exclusions
								-- just write the part to .mbox variable (usually text/plain and text/html, for example)
							else
								set messageMboxText to messageMboxText & currentPart & partDelimiter
							end if
							
							-- And for "single" item MIME parts (i.e. no double newlines), just write the part to .mbox variable
						else
							set messageMboxText to messageMboxText & currentPart & partDelimiter
						end if
						
						-- Finish loop of message parts
						
						set AppleScript's text item delimiters to ""
					end repeat
					
					-- We've inspected all the parts and taken appropriate action
					-- let's write the last MIME part (last boundary closure)
					set messageMboxText to messageMboxText & messagePostamble
					
					
					-- ** Set up write to .mbox file here
					try
						set openMboxFile to open for access file mboxFile with write permission
						write (messageMboxText & "\n\n") to openMboxFile starting at eof
						close access openMboxFile
					on error
						try
							close access openMboxFile
						end try
					end try
					
					-- Finish with messages where attachments have been deleted
				end if
				
			end if
			
			-- End loop of DEVONthink selected items
		end repeat
		
		hide progress indicator
		
		--end try
		
	on error error_message number error_number
		hide progress indicator
		if the error_number is not -128 then display alert "DEVONthink" message error_message as warning
	end try
	
	-- Exit DEVONthink tell loop
end tell

--end timeout

beep 4
display dialog currentRecordNum & " records in " & elapsedTimeHMS
-- And we're done!

on secondsToHMS(theSecs)
	set h to theSecs div 3600
	set s to theSecs - h * 3600
	set m to s div 60
	set s to s - m * 60
	set HMSSTring to (characters -2 thru -1 of (h + 100 as string) & ":" & characters -2 thru -1 of (m + 100 as string) & ":" & characters -2 thru -1 of (s + 100 as string)) as string
	return HMSSTring
end secondsToHMS

on firstRun()
end firstRun

Did some speed testing tonight, and I gotta say, after comparing, I’m pretty chuffed with the performance.

500 random e-mails from 2010 processed in 3m52s – 102 of those had attachments, my script processed just those 102 e-mails in 3m41s, so not much burden checking the extra 398 e-mails rather than filtering on or selecting only attachment-laden messages.

I’ll leave others to test on their own data sets on their own computers, but for the above data sets on my computer, that was in comparison to 18m00s for the 500 e-mails and 14m00s for the 102 e-mails in JXA (both manually-timed JXA results slightly rounded down to nearest minute – my script times itself).

I suspect the additional file handling load from the JXA script’s method is a large part of the slowdown, and while a machine with a better ratio of disk to CPU performance would likely see lesser percentage increases in processing time, I still suspect that the additional file handling will always be a bottleneck. I’d be interested to hear others’ experience on that front.

For those interested in the space savings, here are the pre- and post-processing 2010 mail folder size comparisons resulting from my script:

Item Size
E-mails w/- attachments unchanged 21MB
E-mails w/- no attachments 40MB
Truncation candidates 837MB
Original Total Size 898MB
Item Size
E-mails w/- attachments unchanged 21MB
E-mails w/- no attachments 40MB
Truncated e-mails* 19MB
Extracted attachments 607MB
New Total Size* 687MB

So that’s a 211MB* space saving, and that’s before any tidy up of duplicates.

(*Revised figures after final run of script saw a correction downwards of 36MB of space used, so extra 36MB of space saved)

Which reminds me of another benefit of using the incoming links from the attachments to the e-mail to navigate from the e-mail to its attachments via the Inspector, as opposed to appending links within the message – if duplicate (or unwanted smallish) attachments are deleted, there won’t be any dead links left in the message body.

The extraction of attachments, deletion of smaller ones, processing of MIME parts, and creation of the .mbox file took about 2 hours for the whole 7,218 e-mails from that year.

Importing all 1,380 messages from 2010 with removed attachments took just under a minute, so that’s not a significant impact on total time of processing.

Happy with the results so far, and I’ll run those data sets again after I look at the code and maybe see if I can tighten much up in it.

Sean