Separate/import e-mail attachments for better search V2

mdbraber · April 5, 2022, 10:32am

Last year I wrote a script to separate attachments from e-mails (.eml files). Recently I updated this script to make it more effective and efficient.

The script does the following:

Check (via Python script) each individual email for non-inline attachments that should be imported into DT. The script ignores e.g. .ics attachments or winmail.dat attachments - see the Python script for what it ignores.
Convert the e-mail to RTF and check which attachments from the RTF match the found attachments which should be replaced
Import the found attachments into DT
Create a JSON string for the replacements with the imported attachment names and reference URLs (x-devonthink-item links)
Call the Python script again with the created JSON. The script will strip the found attachments (replace it with empty content) and add an inline HTML part to the e-mail with a list of links to the DT items

The script consists of two parts:

AppleScript replace-attachments.scpt
Python script replace-attachments.py

To install the script(s):

Save the AppleScript and Python script and put them in the same directory - check the properties in the top of the script so the scripts can be found
Make sure you have python3 installed (e.g. using brew install python3) and make sure you have the needed modules by using pip install if needed (see the top of the Python script).

To use the script(s):

Create a selection in DT, easiest is to select only e-mails with attachments by searching for md_attachments>0
The AppleScript is run on each individual message. This works fast enough for me (e.g. 7min on 1200 messages with attachments on my M1 Pro Macbook)

Tips:

You can run this script for each individual message so you can also use this in a Smart Rule e.g. when importing (e-mail) messages. Be aware that sometimes Foundation framework doesn’t work in external script, while it mostly seems to work in inline scripts
The script uses some Foundation framework functions to convert to/from JSON. You could also put these functions in a separate helper script.

Applescript (replace-attachments.scpt):

use AppleScript version "2.4" -- Yosemite (10.10) or later
use scripting additions
use framework "Foundation"

property ca : a reference to current application
property pythonCmd : "/usr/bin/env python3"
property pythonScriptName : "replace-attachments.py"
property replacedTagName : "attachments-extracted"

tell application "Finder"
	set currentPath to POSIX path of ((container of (path to me)) as alias)
	set replaceCmd to pythonCmd & " " & quoted form of currentPath & pythonScriptName & " "
end tell

tell application id "DNtp"
	set theSelection to the selection
	set tmpFolder to path to temporary items
	
	repeat with theRecord in theSelection
		-- second repeat loop so we can mimick the behavior of the 'continue'
		-- command which doesn't exist in AppleScript
		repeat 1 times
			if type of theRecord is unknown and path of theRecord ends with ".eml" and (tags of theRecord does not contain replacedTagName) then
				set recordPath to path of theRecord
				
				-- check if there are any attachments to replace; otherwise proceeed to next e-mail
				set foundAttachmentsJSON to do shell script replaceCmd & (quoted form of recordPath)
				if foundAttachmentsJSON is not equal to "" then
					set foundAttachments to my fromJSON(foundAttachmentsJSON)
				else
					exit repeat
				end if
				
				-- set details of e-mail to variables
				-- (referencing details directly in statements sometimes results in weird errors)
				set recordReferenceURL to reference URL of theRecord
				set recordSubject to name of theRecord
				set recordModificationDate to modification date of theRecord
				set recordCreationDate to creation date of theRecord
				set recordAdditionDate to addition date of theRecord
				set recordGroup to missing value
				set extractedAttachments to {}
				
				-- convert the e-mail to RTF format
				set rtfRecord to convert record theRecord to rich
				
				try
					if type of rtfRecord is rtfd then
						set rtfPath to path of rtfRecord
						
						tell text of rtfRecord
							if exists attachment in attribute runs then
								tell application "Finder"
									set rtfAttachmentList to every file in ((POSIX file rtfPath) as alias)
									repeat with rtfAttachment in rtfAttachmentList
										set rtfAttachmentName to name of rtfAttachment as string
										if rtfAttachmentName is in foundAttachments then
											-- importing skips files inside record database package, so move record to a temporary folder first
											set rtfAttachment to move (rtfAttachment as alias) to tmpFolder with replacing
											tell application id "DNtp"
												
												-- create a group if needed
												if recordGroup is missing value then
													set recordGroup to create record with {name:recordSubject, type:group, creation date:recordCreationDate, modification date:recordModificationDate, addition date:recordAdditionDate} in (parent 1 of theRecord)
												end if
												
												-- import the attachment
												set importedItem to import (POSIX path of (rtfAttachment as string)) to recordGroup
												
												-- link imported item to original e-mail
												set URL of importedItem to recordReferenceURL
												
												-- set dates of importeditem to the original e-mail
												set modification date of importedItem to recordModificationDate
												set creation date of importedItem to recordCreationDate
												
												-- add this attachment to the list of extracted attachments
												set end of extractedAttachments to {rtfAttachmentName, ((reference URL of importedItem) as string)}
												
												log "Found attachment \"" & rtfAttachmentName & "\" to remove from e-mail " & recordSubject
											end tell
										end if
									end repeat
								end tell
								
								if (count of extractedAttachments) is greater than 0 then
									-- convert list of extracted attachments to JSON
									set extractedAttachmentsJSON to my toJSON(extractedAttachments)
									tell application id "DNtp"
										-- move the e-mail to the group with attachments
										move record theRecord to recordGroup
										-- run Python script to replace attachments based on given JSON
										do shell script replaceCmd & "-r " & quoted form of extractedAttachmentsJSON & " " & quoted form of recordPath
										log "Removed attachments from \"" & recordSubject & "\""
										-- add a tag so we know this e-mail has been processed
										set tags of theRecord to (tags of theRecord) & {replacedTagName}
									end tell
								end if
								
							end if
						end tell
					end if
				on error error_message number error_number
					if error_number is not -128 then display alert "Replace attachments" message error_message as warning
				end try
				
				-- remove the temporary record
				delete record rtfRecord
			end if
		end repeat
	end repeat
end tell

on fromJSON(strJSON)
	set {x, e} to ca's NSJSONSerialization's JSONObjectWithData:((ca's NSString's stringWithString:strJSON)'s dataUsingEncoding:(ca's NSUTF8StringEncoding)) options:0 |error|:(reference)
	
	if x is missing value then error e's localizedDescription() as text
	if e ≠ missing value then error e
	
	if x's isKindOfClass:(current application's NSDictionary) then
		return x as record
	else
		return x as list
	end if
end fromJSON

on toJSON(theData)
	set theJSONData to ca's NSJSONSerialization's dataWithJSONObject:theData options:0 |error|:(missing value)
	set JSONstr to (ca's NSString's alloc()'s initWithData:theJSONData encoding:(ca's NSUTF8StringEncoding)) as text
	return JSONstr
end toJSON

Python (replace-attachments.py):

#!/usr/bin/env python3
import argparse
import email
from email import policy
import uuid
import logging
import json

# Adapted from https://github.com/Conengmo/emailstripper/blob/master/emailstripper/run_remove_attachments.py
IMAGE_EXTENSIONS = ('.jpg','.jpeg','.png','.gif','.tiff','.tif', '.bmp')
IMAGE_MIN_SIZE_KB = 150
IGNORE_EXTENSIONS = ('.dat','.rtf', '.ics')
IGNORE_ATTACHMENTS = ('winmail.dat','application')

def walk_attachments(filename, replace_dict):
    
    # open file for reading
    try:
        reader = open(filename, "rb")
    except IOError as e:
        logging.error("Can't open file {}: {}", filename, e.msg)    

    # create an EmailMessage object to analyze
    msg = email.message_from_binary_file(reader,policy=policy.default)

    found_list = []
    replace = len(replace_dict) > 0

    # find attachments and replace if needed
    found_list = walk_over_parts(msg, found_list, filename, replace)
    
    # if attachments are found
    if len(found_list) > 0:
        logging.info('Found {} attachments to replace in {}'.format(len(found_list), filename))
        
        # only replace if number of found attachments matches number of replacements
        # we assume the replacements match the found attachments (not checked) 
        if replace and len(found_list) == len(replace_dict):

            # add replacements in original e-mail
            msg.add_attachment(get_replace_text(replace_dict), disposition='inline', subtype="html")
    
            # write replaced content
            with open(filename, 'w') as writer:
                try:
                    writer.write(msg.as_string())
                except UnicodeEncodeError as e:
                    logging.error(e.msg)
                    exit

        # if there are no replacements, only output found attachments
        elif len(replace_dict) == 0: 
            print(json.dumps(found_list))
            return

        # otherwise something went wrong
        else:
            logging.error("Number of found attachments does not match number of replacements")
            return
    else:
        logging.info("No attachments found to replace")
        return
                
def walk_over_parts(parent, found_list, filename, replace = False):

    # we're done if the parent is not a multi-part message
    if not parent.is_multipart():
        return found_list
    
    # iterate over all pars of the messages    
    for i, part in enumerate(parent.get_payload()):

        # skip plain or html content that isn't an attachment
        if part.get_content_type() in ["text/plain", "text/html"] and not part.is_attachment():
            continue

        # recursively check multipart parts
        if part.is_multipart():
            found_list = walk_over_parts(part, found_list, filename)
            continue

        # find size and name of attachment
        content_size, attachment_name = parse_attachment(part)

        # check if this is something we need to replace
        # if we don't check inline attachments part of this statement is superfluous, but we leave it here for clarity 
        if not (content_size is None or (attachment_name.endswith(IMAGE_EXTENSIONS) and content_size < (IMAGE_MIN_SIZE_KB * 1e3)) or (attachment_name.endswith(IGNORE_EXTENSIONS))):
            
            if replace:
                logging.info('Removing attachment {} with size {:.0f} kB.'.format(attachment_name, content_size / 1e3))
                payload = parent.get_payload()
                # clear the content from the attachment
                # payload.pop(i) does not work in tests, so this also is OK
                payload[i].set_content("")
                parent.set_payload(payload)
            
            # append attachment to list of found items
            found_list.append(attachment_name)       
   
    return found_list


def parse_attachment(part):
    # only get real attachments - add 'inline' if you also want inline attachments
    if not part.get_content_disposition() in ['attachment']:
        return None, None

    # try to get attachment name
    attachment_name = part.get_filename()

    # try to get attachment name via default method, otherwise skip
    if attachment_name is None:
        attachment_name = create_default_name(part)
    if attachment_name is None:
        return None, None

    # do not consider inline images as relevant (this might be superfluous)
    if attachment_name.endswith(IMAGE_EXTENSIONS) and part.get_content_disposition == "inline":
        return None, None
    # skip IGNORE_ATTACHMENTS
    elif attachment_name in IGNORE_ATTACHMENTS:
        return None, None
     
    # calculate attachment size (to ignore too small attachments)
    content = part.get_payload()
    assert type(content) is str
    # https://stackoverflow.com/questions/11761889/get-image-file-size-from-base64-string
    content_size = (len(content) * 3) / 4 - content.count('=', -2)

    return content_size, attachment_name

""" Create a default name for a part"""
def create_default_name(part):
    for tup in part._headers:
        if tup[0] == 'Content-Type':
            """tup[1][6:] extracts 'png' from 'image/png' for example. Sometimes the value is image/x-png...
               Somehow, the 'x-' doesn't pose a problem. Not sure how it gets removed."""
            return part.get_content_disposition() + '-' + str(uuid.uuid4()) + '.' + tup[1][6:]

""" Create HTML for replacement text"""
def get_replace_text(found_list):
    replace_text = ""
    for item in found_list:
        replace_text = "\n\n<li><a href='{}?reveal=1'>{}</a></li>\r\n".format(item[1], item[0]) + replace_text
    return "<html><body style='font-family: helvetica; font-size: large;'><br/><br/><hr><p><strong>Attachments:</strong><ul>{}</ul><p></body></html>".format(replace_text)


if __name__ == '__main__':
    # set logging configuration
    logging.basicConfig(level = logging.INFO, format='%(asctime)s %(levelname)s %(message)s')

    # parse arguments
    parser = argparse.ArgumentParser(description='Replace attachments')
    parser.add_argument('filename',type=str, help='.eml file to parse')
    parser.add_argument('-r',dest='replace', help='replace found attachments with DEVONthink links')
    args = parser.parse_args()

    # only process .eml files
    replace_dict = {}
    if args.filename.endswith('.eml'):
        # check if we need to replace (otherwise found attachments are just printed)
        if args.replace:
            try:
                replace_dict = json.loads(args.replace)
            except ValueError as e:
                logging.error("JSON error: {}", e.msg)
  
        walk_attachments(args.filename, replace_dict)
    else:
        logging.error("Filename needs to end with .eml")

cgrunenberg · April 6, 2022, 12:38pm

Thanks for sharing this script!

AWD · July 23, 2022, 6:37am

Hi mdbraber,

thank you very much for sharing your scripts. I have tried to get it running, but I am facing some issues.

I’ve installed python3 via brew

brew install python3

that worked out, then I tried to install the needed packages you have in your script

pip install argparse
pip install uuid

works fine, but

pip install email

returns

Collecting email
  Using cached email-4.0.2.tar.gz (1.2 MB)
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error
  
  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [1 lines of output]
      ERROR: Can not execute `setup.py` since setuptools is not available in the build environment.
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

I checked. setuptools is installed.

pip install setuptools

Requirement already satisfied: setuptools in /opt/homebrew/lib/python3.9/site-packages (62.3.2)

Same for

pip install logging

Collecting logging
  Using cached logging-0.4.9.6.tar.gz (96 kB)
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error
  
  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [19 lines of output]
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 14, in <module>
        File "/opt/homebrew/lib/python3.9/site-packages/setuptools/__init__.py", line 16, in <module>
          import setuptools.version
        File "/opt/homebrew/lib/python3.9/site-packages/setuptools/version.py", line 1, in <module>
          import pkg_resources
        File "/opt/homebrew/lib/python3.9/site-packages/pkg_resources/__init__.py", line 83, in <module>
          __import__('pkg_resources.extern.packaging.specifiers')
        File "/opt/homebrew/lib/python3.9/site-packages/pkg_resources/_vendor/packaging/specifiers.py", line 24, in <module>
          from .utils import canonicalize_version
        File "/opt/homebrew/lib/python3.9/site-packages/pkg_resources/_vendor/packaging/utils.py", line 8, in <module>
          from .tags import Tag, parse_tag
        File "/opt/homebrew/lib/python3.9/site-packages/pkg_resources/_vendor/packaging/tags.py", line 5, in <module>
          import logging
        File "/private/var/folders/7s/0kzpjfrn2bn8h9ltz3xgdrb00000gn/T/pip-install-s5h4uv5y/logging_0425c0818c524fc9affab4f2b06c4cd8/logging/__init__.py", line 618
          raise NotImplementedError, 'emit must be implemented '\
                                   ^
      SyntaxError: invalid syntax
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

Additionally Json package couldn’t be found

pip install json
ERROR: Could not find a version that satisfies the requirement json (from versions: none)
ERROR: No matching distribution found for json

But there is Json5 as a package is this also okay?

Regarding the issues with email and logging could you please help me out with some tip on how to fix this? I don’t have any experience with python. In any case thank you very much.

Best regards
AWD

mdbraber · July 23, 2022, 7:10am

email, argparse and uuid are all part of the base distribution of Python. When doing pip install email you’re importing (confusingly I admit) a 3rd party module also callled email. So try running the scripts without doing pip install

AWD · July 23, 2022, 7:57am

Hi,

and are logging and json needed?

these are the packages I have installed now:

Package               Version
--------------------- -------
json5                 0.9.8
pip                   22.2
policy                1.0.0
PyQt3D                5.15.5
PyQt5                 5.15.7
PyQt5-sip             12.11.0
PyQtChart             5.15.6
PyQtDataVisualization 5.15.5
PyQtNetworkAuth       5.15.5
PyQtPurchasing        5.15.5
PyQtWebEngine         5.15.6
QScintilla            2.13.3
setuptools            63.2.0
uuid                  1.30
wheel                 0.37.1

The script doesn’t run properly

It creates an empty group with the same name as the .eml file but that’s all.

BR
AWD

mdbraber · July 23, 2022, 7:59am

Those are also standard Python modules - no pip install needed

mdbraber · July 23, 2022, 8:00am

Best is to first try debugging running the script in Script Editor (or Script Debugger is you use that) and look at the logging window.

AWD · July 23, 2022, 10:31am

I hope I did it correctly

tell current application
	path to current application
		--> alias "Macintosh HD:Users:awd:Library:Application Scripts:com.devon-technologies.think3:Menu:Import:replace-attachments.scpt"
end tell
tell application "Finder"
	get container of alias "Macintosh HD:Users:awd:Library:Application Scripts:com.devon-technologies.think3:Menu:Import:replace-attachments.scpt"
		--> alias "Macintosh HD:Users:awd:Library:Application Scripts:com.devon-technologies.think3:Menu:Import:"
end tell
tell application "DEVONthink 3"
	get selection
		--> {content id 28379 of database id 3}
	path to temporary items
		--> alias "Macintosh HD:private:var:folders:7s:0kzpjfrn2bn8h9ltz3xgdrb00000gn:T:TemporaryItems:"
	get type of content id 28379 of database id 3
		--> unknown
	get path of content id 28379 of database id 3
		--> "/Users/awd/Datenbanken/Privat.dtBase2/Files.noindex/eml/7/Some Subject.eml"
	get tags of content id 28379 of database id 3
		--> {}
	get path of content id 28379 of database id 3
		--> "/Users/awd/Datenbanken/Privat.dtBase2/Files.noindex/eml/7/Some Subject.eml"
	do shell script "/usr/bin/env python3 '/Users/awd/Library/Application Scripts/com.devon-technologies.think3/Menu/Import/'replace-attachments.py '/Users/awd/Datenbanken/Privat.dtBase2/Files.noindex/eml/7/Some Subject.eml'"
		--> error number -10004
end tell
tell current application
	do shell script "/usr/bin/env python3 '/Users/awd/Library/Application Scripts/com.devon-technologies.think3/Menu/Import/'replace-attachments.py '/Users/awd/Datenbanken/Privat.dtBase2/Files.noindex/eml/7/Some Subject.eml'"
		--> "[\"some textfile.txt\"]"
end tell
tell application "DEVONthink 3"
	get reference URL of content id 28379 of database id 3
		--> "x-devonthink-item://%3C5F21C7FB-4300-4E8A-B5F7-F2CA56DF4576@pm.me%3E"
	get name of content id 28379 of database id 3
		--> "Some Subject"
	get modification date of content id 28379 of database id 3
		--> date "Samstag, 23. Juli 2022 um 12:26:14"
	get creation date of content id 28379 of database id 3
		--> date "Samstag, 23. Juli 2022 um 12:26:16"
	get addition date of content id 28379 of database id 3
		--> date "Samstag, 23. Juli 2022 um 12:26:23"
	convert record content id 28379 of database id 3 to rich
		--> content id 28408 of database id 3
	get type of content id 28408 of database id 3
		--> rtfd
	get path of content id 28408 of database id 3
		--> "/Users/awd/Datenbanken/Privat.dtBase2/Files.noindex/rtfd/8/Some Subject.rtfd"
	exists attachment of every attribute run of every text of content id 28408 of database id 3
		--> true
end tell
tell application "Finder"
	get POSIX file "/Users/awd/Datenbanken/Privat.dtBase2/Files.noindex/rtfd/8/Some Subject.rtfd"
		--> error number -1728 from POSIX file "/Users/awd/Datenbanken/Privat.dtBase2/Files.noindex/rtfd/8/Some Subject.rtfd"
	get every file of alias "Macintosh HD:Users:awd:Datenbanken:Privat.dtBase2:Files.noindex:rtfd:8:Some Subject.rtfd:"
		--> {document file "TXT.rtf" of document file "Some Subject.rtfd" of folder "8" of folder "rtfd" of folder "Files.noindex" of document file "Privat.dtBase2" of folder "Datenbanken" of folder "awd" of folder "Users" of startup disk, document file "some textfile.txt" of document file "Some Subject.rtfd" of folder "8" of folder "rtfd" of folder "Files.noindex" of document file "Privat.dtBase2" of folder "Datenbanken" of folder "awd" of folder "Users" of startup disk}
	get name of document file "TXT.rtf" of document file "Some Subject.rtfd" of folder "8" of folder "rtfd" of folder "Files.noindex" of document file "Privat.dtBase2" of folder "Datenbanken" of folder "awd" of folder "Users" of startup disk
		--> "TXT.rtf"
	get name of document file "some textfile.txt" of document file "Some Subject.rtfd" of folder "8" of folder "rtfd" of folder "Files.noindex" of document file "Privat.dtBase2" of folder "Datenbanken" of folder "awd" of folder "Users" of startup disk
		--> "some textfile.txt"
	get document file "some textfile.txt" of document file "Some Subject.rtfd" of folder "8" of folder "rtfd" of folder "Files.noindex" of document file "Privat.dtBase2" of folder "Datenbanken" of folder "awd" of folder "Users" of startup disk
		--> alias "Macintosh HD:Users:awd:Datenbanken:Privat.dtBase2:Files.noindex:rtfd:8:Some Subject.rtfd:some textfile.txt"
	move alias "Macintosh HD:Users:awd:Datenbanken:Privat.dtBase2:Files.noindex:rtfd:8:Some Subject.rtfd:some textfile.txt" to alias "Macintosh HD:private:var:folders:7s:0kzpjfrn2bn8h9ltz3xgdrb00000gn:T:TemporaryItems:" with replacing
		--> document file "some textfile.txt" of folder "TemporaryItems" of folder "T" of folder "0kzpjfrn2bn8h9ltz3xgdrb00000gn" of folder "7s" of folder "folders" of folder "var" of item "private" of startup disk
		--> error number 0
end tell
tell application "DEVONthink 3"
	create record with {name:"Some Subject", type:group, creation date:date "Samstag, 23. Juli 2022 um 12:26:16", modification date:date "Samstag, 23. Juli 2022 um 12:26:14", addition date:date "Samstag, 23. Juli 2022 um 12:26:23"} in parent 1 of content id 28379 of database id 3
		--> parent id 28410 of database id 3
end tell
tell application "Finder"
	get document file "some textfile.txt" of folder "TemporaryItems" of folder "T" of folder "0kzpjfrn2bn8h9ltz3xgdrb00000gn" of folder "7s" of folder "folders" of folder "var" of item "private" of startup disk
		--> "Macintosh HD:private:var:folders:7s:0kzpjfrn2bn8h9ltz3xgdrb00000gn:T:TemporaryItems:some textfile.txt"
end tell
tell application "DEVONthink 3"
	import "/private/var/folders/7s/0kzpjfrn2bn8h9ltz3xgdrb00000gn/T/TemporaryItems/some textfile.txt" to parent id 28410 of database id 3
		--> missing value
	display alert "Replace attachments" message "„URL of missing value“ kann nicht als „\"x-devonthink-item://%3C5F21C7FB-4300-4E8A-B5F7-F2CA56DF4576@pm.me%3E\"“ gesetzt werden." as warning
		--> {button returned:"OK"}
	delete current application record content id 28408 of database id 3
		--> true
end tell
Ergebnis:
true

mdbraber · July 23, 2022, 2:40pm

AWD:

tell application "DEVONthink 3"
	import "/private/var/folders/7s/0kzpjfrn2bn8h9ltz3xgdrb00000gn/T/TemporaryItems/some textfile.txt" to parent id 28410 of database id 3
		--> missing value

Something’s going wrong when trying to import the temporary file. Does DT3 have Full Disk Access?

AWD · July 23, 2022, 3:05pm

No, it didn’t. Now it works.

Thank you very much.

chrillek · July 24, 2022, 1:10pm

May I suggest a (still very crude) implementation in JavaScript? It might be a tad easier to manage as the AppleScript/Python implementation, and it doesn’t rely on any external software being installed.

The script currently works for PDF and (some) image types only, and it only handles the first currently selected record. Both can be easily remedied. Also, I have no idea if it does everything your code does. For example, it seems that you remove the attachments from the original mail – I do not do that, though.

Anyway, the code simply reads the EML into a text variable. It then finds all boundaries (i.e. the strings separating the different mail parts) and splits the EML into the different parts at these boundaries.

It then iterates over these parts and writes out those with Content-disposition inline and attachment in their own temporary files. These files are then imported into DT.

As I said: very crude. But a bit less code, and no dependencies. The whole “write to temp file and import it” part should be replaced by a simple createRecord. But I couldn’t manage to set the data property of this record to the PDF yet. This should work, but apparently doesn’t yet. Or I don’t know how to handle it correctly…

ObjC.import('Foundation');

/* Associate Content-type with a DT record type. This is currently 
  only used to weed out unsupported types */
const typeFromMIME = {
  'application/pdf': 'pdf',
  'image/jpeg': 'image',
  'image/jpg' : 'image',
  'image/png' : 'image',
  'image/tiff': 'tiff',
  'text/html' : 'html'
};

(() => {
  const app = Application("DEVONthink 3")
  app.includeStandardAdditions = true;
  const path = app.selectedRecords()[0].path();
  
  /* Get the filesystem path of the first selected record */
  const error = $();
  
  /* Read the content of the record into an NSString object, return a JavaScript string */
  const content = $.NSString.stringWithContentsOfFileEncodingError($(path), $.NSUTF8StringEncoding, error).js;

  
  /* Build a regular expression to match all boundaries */
  const boundaries = [... content.matchAll(/boundary="?(.*?)"?;?\n/g)];
  if (! boundaries || boundaries.length < 1) {
    console.log(`No boundary found in EML`);
  }

  const allBoundaries = boundaries.map(b => b[1]).join('|');
  const boundaryRE = new RegExp(`^--(${allBoundaries})?\n`,'ms');
  
  /* Split the content at the boundaries. */
  const parts = content.split(boundaryRE);
  
  /* parts now contains all the message, i.e. body & attachments. Loop over them */
  parts.forEach((p,i) => {
    
    /* Split the current part at two subsequent empty lines */
    const subparts = p.split(`\n\n`);
    
    /* Split the first part of the current part into lines, store them in header */
    const header = subparts[0].split(`\n`);
    
    /* Save the main part of the current part in body */
    const body = subparts[1] ;    
    
    /* Handle attachments: the first element of the header must contain a Content-Disposition: */
    if (/Content-Disposition: (inline|attachment);/.test(header[0])) {
        
        /* Get the header lines with the raw filename and MIME types */
        const filenameRaw = header.filter(h => /filename=/.test(h))[0];
        const mimeTypeRaw = header.filter(h => /Content-Type:/.test(h))[0];
        
        /* convert raw filename and MIME type to the correct strings */
        const filename = filenameRaw.match(/filename="?([^"]*)"?/)[1];
        const mimeType = mimeTypeRaw.match(/: (.*)?;/)[1];
        
        /* Get DT's record type corresponding to the current MIME type */
        const DTtype = typeFromMIME[mimeType];
        if (!DTtype) {
          /* ignore all attachments with unsupported MIME types */
          console.log(`mimetype ${mimeType} not suppored`);
          return; 
        }
      
        /* Decode the body of the attachment into an NSData object. 
        Remove the last boundary first, otherwise the decode will fail */
        const decodedData = $.NSData.alloc.initWithBase64EncodedStringOptions($(body.replace(/^--.*--$/m,"")), $.NSDataBase64DecodingIgnoreUnknownCharacters);


        /* Save the decoded attachment in a temporary file */
        const tmpPath = `/tmp/${filename}`;
        decodedData.writeToFileAtomically(tmpPath, false);
        
        /* Create a new record from the temporary file */
        app.import(tmpPath);
        
        /* Should remove tmpPath here */
      }
  })
})()

mdbraber · July 24, 2022, 2:56pm

@chrillek thanks for this alternative implementation!

There is definitely a lot that can be improved and doing things without 3rd party languages like Python would be best. On the other hand: handling e-mail (and all their quirks) is daunting. I’ve got over 300K e-mails I’ve run through my script and I can rely on the Python implementation to handle most edge cases quite well.

The 3rd party implementation right now saves me from having to learn all the quirks of handling e-mail and MIME attachments so for me right now is the best option.

AWD · July 27, 2022, 4:17pm

Hi mdbraber,

after a few days of using the script, I have to say it is really great. Especially when you just want to keep the mail without the attachments.
I also found one thing when using it. The size of the .eml doesn’t change in Devonthink. Let’s say the .eml was 5MB with its attachment included. After separating Devonthink still says the .eml is 5MB and the attachment is 4.8MB, but when I inspect the files in Finder it is 200kB and 4.8MB. No big thing generally, but is there a known reason for this?

BR
AWD

mdbraber · August 18, 2022, 10:20am

DT doesn’t re-index the files. To get DT to report the right size I think you need to rebuild the DB.

AWD · May 27, 2025, 6:36pm

Hi, I did not yet test the DT4 Beta. But I was reading a bit here in the forum and recognized, that not all scripts are working in DT4. Just out of curiosity. Is this script compatible with the new DT4?

BLUEFROG · May 27, 2025, 6:49pm

tell application “DEVONthink 3”

Not if it uses this, it wouldn’t be. And that’s not the recommended form anyways. tell application id "DNtp" has been advocated for years.

cgrunenberg · May 28, 2025, 5:45am

DEVONthink 4 (Pro/Server editions) is able to import attachments on its own, therefore this script isn’t required anymore.

rkaplan · May 28, 2025, 7:41am

To keep track of where/when attachments came, I find it extremely helpful to create a new subgroup containing both the email and the associated attachments. I believe it still requires a script to do that.

It seems to me that for most people an email group would become hopelessly confused if attachments were randomly in the group and not easily associated with the messages they were originally attached to.

AWD · June 27, 2025, 8:16pm

Hi mdbraber, is there any chance that you update your script for DT4? Or is it still working, and I am doing something wrong?

BLUEFROG · June 27, 2025, 8:50pm

What would be the need for the script with DEVONthink 4’s Tools > Import Email Attachments?