Another possibly silly follow up question, is there any easy way to set up an auto forwarding rule to forward all my email to something that DT could receive, file, and store for me? It seems like the only options for import are limited or difficult. But having a way simply to establish an auto forwarder would make it so much easier to intake all those messages if DT could receive them.
Oh, cool - I hadn’t even looked at one of those links recently, using the Message ID, which equals the UUID in DEVONthink, makes such linking super easy.
It’s annoying you can’t use it to select or refer to an e-mail in AppleScript, though, it’ll open the e-mail, and then you’d have to look for message details from that then close the window. I may test that at some stage, who knows, it may be faster than what I’m doing, and it certainly removes one selection task for the operator (or may be a reliable way to refer to an e-mail from within an automatically-triggered script).
Sean
Thanks for those.
Current version of the script automatically generates an import .mbox filename from the group which is being scanned, and creates a child group (or utilises an existing child group) of that group named [groupname] Attachments
(if there is more than 1 such group, a group selector is shown).
So now it’s just choose Mail mailbox where the originals are, and choose a DEVONthink group to scan.
Wish I could find e-mails in Mail directly by UUID – next up is testing how slow that might be if it’s at all possible via opening an e-mail’s URL in Mail, grabbing attributes from that window, then closing.
Then testing doing date time to GMT via do shell script
to compare speed to the Foundation framework method.
After that, I start poring over it to see if I can find some speed ups/rationalisations
Sean
I feel like this is going to be a dead end (except for the solution of just trawling through all mailboxes and returning the first one with a match on the Message-ID
).
The document
attribute of a window which is displaying an “opened” e-mail is missing value
. The name
is the Subject and enclosing mail folder name, but that’s not good enough to be useful, IMO.
Moving on…
Sean
Apple’s scripting dictionary exposes an message id
for a message
. As you’d explained to me before, that should be the UUID used by DT. So a nicely crafted whose
expression might be able to find a message by UUID.
As it stands, a message
belongs to a mailbox
. Consequently, you’d have to loop over all accounts
, then their mailboxes
and run a whose
on their messages
to find the corresponding message. That’s certainly a PITA.
See this: referencing a message with only "message id" in "Mail" - AppleScript | Mac OS X - MacScripter
Sample code in JavaScript:
(() => {
const dtURL = 'x-devonthink-item://%3CCE12711E-E84B-4EC8-9E82-632A7E67D5EB%40example.com%3E';
const messageID = decodeURIComponent(dtURL.replace(/^.*:\/\//,'')).replaceAll(/[<>]/g,'');
console.log(messageID);
const mailApp = Application("Mail");
mailApp.accounts().forEach(a => a.mailboxes().forEach(m => {
const result = m.messages.whose({messageId: messageID});
if (result.length) {
console.log(result[0].source());
}
})
)
})()
The example from MacScripter runs a loop also over all mailboxes not in any account. I am not sure if that’s a necessary step here.
Note: The “raw” UUID from DT has to be massaged so that it matches the message id
of Apple’s Mail (which is not the original message ID in the mail header!):
- First, DT’s UUID must be URL-decoded;
- Then the leading
x-devonthink-item://
must be removed - Finally, the enclosing
<>
must be removed.
(not necessarily in this sequence). All that happens in the line above:
const messageID = decodeURIComponent(dtURL.replace(/^.*:\/\//,'')).replaceAll(/[<>]/g,'');
Thanks, I was aware of these issues, hence my more recent message:
I had been hoping if Apple Mail quickly opens a message from the URL that’s built from the Message-ID
(so very similar to DEVONthink’s URL), that I might be able to use that as a shortcut to determine attributes/properties from the message (and avoid needing to specify the mailbox in the script) – that doesn’t work (as I’m sure has been determined and discussed previously)
It is a necessary step if you have local mailboxes – account mailboxes are those such as IMAP online mailboxes, local mailboxes (those without an account attribute when scripting) are not stored online.
I used to use local mailboxes for my archives because of storage limits on prior IMAP servers back in the day, but they were local to the Mac only. Since I have so much iCloud space, I now store them on the iCloud IMAP server, not locally, and my other devices (and iCloud webmail) can see them.
Sean
Foundation framework is much quicker, over 100 translations effectively zero overhead, the do shell script
method added 44 seconds overhead for 100 translations.
So I stick with Foundation framework for that.
Sean
Searching by message ID is more difficult because the same message can be stored in several mailboxes. Although the message ID should be unique, that only refers to its origin, i.e. the sending server. Once it has arrived, the user can copy it to different mailboxes as often as they wish. Perhaps there are clients that track messages by their ID internally, but I doubt that (not very useful, I guess). Mail definitely doesn’t do it.
And even IMAP doesn’t offer a server-wide search – it always refers to a mailbox: RFC 3501 - INTERNET MESSAGE ACCESS PROTOCOL - VERSION 4rev1
Just curious what happens when a message is copied to another mailbox and its attachments are removed there (manually). When searching for the UUID turns up this message as the first matching one, the attachments are gone…
Not surprisingly, as doShellScript
requires spawning a process. Which takes more time than a procedure call. And a reason why I try to avoid that command whenever I can.
Unless you ask for all messages whose message id is <xyz>
I guess it will return whichever is the first matching message.
Mail definitely doesn’t preclude duplicate messages the way DEVONthink does, but I have imported messages with removed attachments into DEVONthink’s Global inbox when the original full message is still in my e-mail archive database, but I’d have to delete the original full message before moving (or importing) the truncated message into the e-mail archive database so it takes over the identity of that message ID, otherwise it won’t be imported, or it will import as a replicant of the original message (depending on settings).
Sean
As I understand it, whose
always returns a list of all matches. At least it does in JXA.
Next edge case – attachments which have not, for whatever reason, been loaded/downloaded into Mail.
155 of them in 2015-2017 for me.
DEVONthink then complains about file errors, and won’t validate/repair the database.
Not too big a deal, a proportion of those (I don’t know how many yet) will have already existing copies in my main database (as I created them or saved them when I received them), or duplicates in the mail archive database.
So I’m just plowing on for now before I decide if I explicitly do something (yet to be determined) with “empty” attachments.
I’ll have more funky edge cases the older I get as I encounter messages I imported (in some long forgotten way) from TeleFinder BBS, while hopefully pine and Eudora will be better from .mbox imports (although I suspect the Eudora ones may well have had the attachments stripped from the mail source already).
Maybe I’ll bite the bullet and start approaching 2014 (the one currently being processed) from “the other side” and see how bad the situation is
Sean
Forgot the mention, such “missing” attachments are zero-byte files when I use the import attachments from
, and that seems to upset DEVONthink a little.
Anyway, after reading @chrillek’s thread about doing a rejig of @mdbraber’s “all in DEVONthink using RTF” method, and their thread about request for access to the source of an e-mail record through scripting, I’m testing doing the latter instead of getting the source from Mail.
Biggest advantages are that:
- I’m only ever in one application
- Which means I don’t need to select, or guess or reference, any given mailbox to find then work on the e-mail’s source
- I won’t need Foundation framework to inherit the “figure out GMT offset for any date” procedure call for the e-mail’s first line I was recreating (as it’s not available in Mail’s AppleScript dictionary, but is in the DEVONthink e-mail record source), so I’m only ever in plain AppleScript
- I don’t need to test whether a message from Mail has attachments or not (I was doing that as a double check in Mail, possibly superfluously), as I’m working on the record that I’ve already tested that for (and only on messages which have attachments being kept in DEVONthink).
Just finalising the rejig, will run it against the most recent mail folder which is currently processing the original way and compare timing - I’m expecting it to be noticeably faster, and it is certainly a much shorter script.
Maybe I’ll have time to include some tagging, I’ve just been laying off that for the moment to not have to learn “yet another thing” right now.
Sean
Definitely faster (maybe twice as fast – still not “speedy”, but that’s fine, as it will rarely be run against this many messages).
Took the shortcut of just assigning a AttachmentsDeleted
tag as I process so I can easily find and delete them before importing the truncated messages in the .mbox – but then I thought “Oh, I haven’t tagged all the other messages I’ve already processed!”
DEVONthink developers to the rescue again – I can create a Smart Group which finds all e-mail in the e-mail archive with incoming links
Or I can I can display that as a column in list view…or I can work on it in AppleScript, etc.
So many ways for me to identify the already processed e-mails where attachments are deleted in the .mbox as the kept messages in DEVONthink have links pointing to the messages, which DEVONthink is sensibly counting as incoming links.
Thanks (again) to the developers for a great product with a deep feature set!
Sean
In what respect?
I cancelled an import at one point, then closed DEVONthink and opened it again.
When I did that it reported a database verification failure, and that I should do a repair.
I tried that, and it complained:
I try Repair, and it reports:
Log shows error messages like:
Looking at that type of file, I find these:
Which look like this in e-mail view
And if I deleted the files it stopped complaining.
It’s not what I consider a bug in DEVONthink – except maybe it shouldn’t import empty attachments? Or shouldn’t keep empty imported attachments?
Sean
Usually DEVONthink imports every file unmodified, even empty ones. But the verification considers certain empty files to be most likely broken (e.g. empty PDF documents).
Would it be possible to delete those files imported by DEVONthink which it considers as broken?
Or for verification to actually state what the error is.
With no error message, it’s a little hard to know something has gone “wrong”.
For those who don’t have the Log showing all the time, it can seem unsettling, and I don’t think it should be a requirement to always show the Log.
But maybe that’s just me.
Thanks.
Sean
The error is shown in the Log for each file (as the issue might vary). And via the contextual menu it’s also possible to trash them.
Here’s the current version, hoping to rationalise a few things over the weekend, maybe create some handlers, check the comments after recent changes, but otherwise it likely won’t change much more until I finish processing my remaining mail folders (1990s and 2000s) and deal with anything weird those years throw in my way.
use AppleScript version "2.4" -- Yosemite (10.10) or later
use scripting additions
-- Define size of inline attachments above which we keep the imported attachments and delete the MIME part
set minPictureSize to 50000 -- put in scope to round up to next divisible by 4 numnber if changed
-- Figure out the base64-encoded size of minPictureSize – we'll add line breaks later (technically, line lengths won't always be 76)
set minbase64size to (round (minPictureSize / 3) rounding up) * 4
-- Let's set up our MIME part delimiting stuff
set partDelimiter to "\n--"
set currentRecordNum to 0
tell application id "DNtp"
activate
-- Let's set the group for messages
set emailsGroup to display group selector "Select a group with e-mails:" buttons {"Cancel", "OK"}
set emailsGroupName to the name of emailsGroup
--set .mbox name based on emailGroupName and delete existing file
set mboxFileName to emailsGroupName & " Import.mbox"
set mboxFile to (((path to desktop folder) as string) & mboxFileName)
try
tell application "Finder" to delete file mboxFile
end try
-- Let's set the group for attachments
set attachmentsGroupName to emailsGroupName & " Attachments"
--*2025-08-01 This is still not working for values ≠ 0
if (count (children of emailsGroup whose name is attachmentsGroupName)) = 0 then
set attachmentsGroup to create record with {name:attachmentsGroupName, type:group} in emailsGroup
else if (count (children of emailsGroup whose name is attachmentsGroupName)) = 1 then
set attachmentsGroup to (child of emailsGroup whose name is attachmentsGroupName)
else
set attachmentsGroup to display group selector "Select a group for attachments:" buttons {"Cancel", "OK"}
end if
--/2025-08-01
-- Set up some variables for the progress indicator
set countRecords to (count children of emailsGroup)
set startTime to current date
set elapsedTimeHMS to ""
-- Let's go!
try
-- Where are we up to?
show progress indicator "Separating attachments…" steps (countRecords)
-- Go through the group with the e-mails
repeat with currentItem in (children of emailsGroup)
--Which record are we at?
set currentRecordNum to currentRecordNum + 1
-- Some more progress indicator stuff once we've passsed one record processed
if currentRecordNum > 1 then
set elapsedTime to round ((current date) - startTime)
set perItemTime to round (elapsedTime / (currentRecordNum - 1))
set totalEstTime to elapsedTime + perItemTime * (countRecords - currentRecordNum)
set elapsedTimeHMS to my secondsToHMS(elapsedTime)
--set elapsedTimeHMS to secondsToHMS from elapsedTime
set totalEstTimeHMS to my secondsToHMS(totalEstTime)
set timeProgressString to " (" & elapsedTimeHMS & "/" & totalEstTimeHMS & ")"
else -- or for first record
set timeProgressString to ""
end if
-- And show the calculated progress numbers and current record name
step progress indicator (currentRecordNum as string) & "/" & (countRecords as string) & timeProgressString & ": " & name of currentItem
-- Only process records with attachments
if attachment count of currentItem > 0 then
-- new message, nothing deleted yet
set attachmentsDeleted to false
-- We need the DEVONthink item link because we'll put that into the URL field of the imported attachments we're keeping in DEVONthink
set recordRefURL to the reference URL of currentItem
-- Now we import the attachments and hold onto a reference to them all
set importedAttachments to import attachments of record currentItem to attachmentsGroup
-- Looping through those attachments, we'll delete pictures smaller than 50,000 bytes, and set the URL to the e-mail for everything else
repeat with currentAttachment in importedAttachments
if (the record type of currentAttachment) as string = "picture" and (the size of currentAttachment) < minPictureSize then
delete record currentAttachment
else
set the URL of currentAttachment to recordRefURL
-- we also give the tag for this item on whether to do anything with the source data )any time we keep an attachment in DEVONthink we want to delete it from the mail source and save that modified source in the mbox file
set attachmentsDeleted to true
set tags of currentItem to {"AttachmentsDeleted"}
end if
end repeat
-- Once attachments processed, if any were kept, let's work on the e-mail for inclusion in the .mbox file
if attachmentsDeleted then
set recordPath to the path of currentItem
set messageMboxText to ""
set messageSource to (read POSIX file recordPath)
-- We do this so we can split the message by MIME parts (each text item is a MIME part)
set AppleScript's text item delimiters to partDelimiter
set messageParts to every text item of messageSource
set countMessageParts to count of messageParts
-- Grab everything up to the actual first MIME part
set messagePreamble to item 1 of messageParts
-- Grab the last MIME part (which will be a boundary ending spec)
set messagePostamble to item countMessageParts of messageParts
-- Start the .mbox data with the Preamble
set messageMboxText to messageMboxText & messagePreamble & partDelimiter
-- Loop through everything except the first and last parts
repeat with partNumber from 2 to (countMessageParts - 1)
-- Now we inspect and act on each part as necessary
set currentPart to item partNumber of messageParts
-- Attachments will always have the encoded part after two newlines, so now we split the currentPart using two newlines
-- ** This may be able to be swapped around with the Content-Type tests following
set AppleScript's text item delimiters to "\n\n"
-- Is there more than one section to this MIME part separated by two newlines?
-- Let's do something if there is
if (count text items of currentPart) > 1 then
-- Let's test for the inline Content-Types we want to consider actions for
-- Firstly inline images
if ((text item 1 of currentPart contains "Content-Type: image/")) and (text item 1 of currentPart contains "Content-Disposition: inline") then
-- grab the base64 data
-- This is for when inline or image parts don't have a second interpretable part 2025-07-28
try
set encodedData to text item 2 of currentPart
-- Check the line length of the base64 encoded data
set b64LineLength to count characters of (first paragraph of encodedData)
-- Add the base minimum base64 encoded size to the appropriate number of newline characters
set minMIMEEncodedSize to minbase64size + (round (minbase64size / b64LineLength))
-- Grab the size of the attachment's encoded data
set currentPartSize to count characters of encodedData
-- Is the encoded data smaller than our minimum size?
if currentPartSize < minMIMEEncodedSize then
-- If so, write that small encoded data into our .mbox variable
set messageMboxText to messageMboxText & currentPart & partDelimiter
end if
on error
set messageMboxText to messageMboxText & currentPart & partDelimiter
end try
-- For non-inline images, or application/ Content-Types
else if ((text item 1 of currentPart contains "Content-Type: image/") or (text item 1 of currentPart contains "Content-Type: application/")) then
-- Do nothing - i.e. don't add them to the .mbox variable
-- And for anything that doesn't have two newlines while also not matching the above Content-Type and size tests and exclusions
-- just write the part to .mbox variable (usually text/plain and text/html, for example)
else
set messageMboxText to messageMboxText & currentPart & partDelimiter
end if
-- And for "single" item MIME parts (i.e. no double newlines), just write the part to .mbox variable
else
set messageMboxText to messageMboxText & currentPart & partDelimiter
end if
-- Finish loop of message parts
set AppleScript's text item delimiters to ""
end repeat
-- We've inspected all the parts and taken appropriate action
-- let's write the last MIME part (last boundary closure)
set messageMboxText to messageMboxText & messagePostamble
-- ** Set up write to .mbox file here
try
set openMboxFile to open for access file mboxFile with write permission
write (messageMboxText & "\n\n") to openMboxFile starting at eof
close access openMboxFile
on error
try
close access openMboxFile
end try
end try
-- Finish with messages where attachments have been deleted
end if
end if
-- End loop of DEVONthink selected items
end repeat
hide progress indicator
--end try
on error error_message number error_number
hide progress indicator
if the error_number is not -128 then display alert "DEVONthink" message error_message as warning
end try
-- Exit DEVONthink tell loop
end tell
--end timeout
beep 4
display dialog currentRecordNum & " records in " & elapsedTimeHMS
-- And we're done!
on secondsToHMS(theSecs)
set h to theSecs div 3600
set s to theSecs - h * 3600
set m to s div 60
set s to s - m * 60
set HMSSTring to (characters -2 thru -1 of (h + 100 as string) & ":" & characters -2 thru -1 of (m + 100 as string) & ":" & characters -2 thru -1 of (s + 100 as string)) as string
return HMSSTring
end secondsToHMS
on firstRun()
end firstRun
Did some speed testing tonight, and I gotta say, after comparing, I’m pretty chuffed with the performance.
500 random e-mails from 2010 processed in 3m52s – 102 of those had attachments, my script processed just those 102 e-mails in 3m41s, so not much burden checking the extra 398 e-mails rather than filtering on or selecting only attachment-laden messages.
I’ll leave others to test on their own data sets on their own computers, but for the above data sets on my computer, that was in comparison to 18m00s for the 500 e-mails and 14m00s for the 102 e-mails in JXA (both manually-timed JXA results slightly rounded down to nearest minute – my script times itself).
I suspect the additional file handling load from the JXA script’s method is a large part of the slowdown, and while a machine with a better ratio of disk to CPU performance would likely see lesser percentage increases in processing time, I still suspect that the additional file handling will always be a bottleneck. I’d be interested to hear others’ experience on that front.
For those interested in the space savings, here are the pre- and post-processing 2010 mail folder size comparisons resulting from my script:
Item | Size |
---|---|
E-mails w/- attachments unchanged | 21MB |
E-mails w/- no attachments | 40MB |
Truncation candidates | 837MB |
Original Total Size | 898MB |
Item | Size |
---|---|
E-mails w/- attachments unchanged | 21MB |
E-mails w/- no attachments | 40MB |
Truncated e-mails* | 19MB |
Extracted attachments | 607MB |
New Total Size* | 687MB |
So that’s a 211MB* space saving, and that’s before any tidy up of duplicates.
(*Revised figures after final run of script saw a correction downwards of 36MB of space used, so extra 36MB of space saved)
Which reminds me of another benefit of using the incoming links from the attachments to the e-mail to navigate from the e-mail to its attachments via the Inspector, as opposed to appending links within the message – if duplicate (or unwanted smallish) attachments are deleted, there won’t be any dead links left in the message body.
The extraction of attachments, deletion of smaller ones, processing of MIME parts, and creation of the .mbox file took about 2 hours for the whole 7,218 e-mails from that year.
Importing all 1,380 messages from 2010 with removed attachments took just under a minute, so that’s not a significant impact on total time of processing.
Happy with the results so far, and I’ll run those data sets again after I look at the code and maybe see if I can tighten much up in it.
Sean