Separate imported e-mail attachments for better search

This sounds like a tricky operation actually. In case of multiple attachments having no or the same name it might be difficult to figure out the related file.

Maybe not that tricky, but I might be overlooking something, so open for tips. The Python script is taking the reference URL and filename directly from what the Applescript you wrote has done (I just write the separated attachment name + reference URL to a Finder comment so they can be linked to the .eml file).

I don’t figure out any attachments names etc. in the Python script (that’s being done by the import part in the Applescript). I just remove all of them (above a certain size) with the Python script from the .eml file and replace the attachments with a bit of HTML with links to the DT items from the Finder comment:

Example with an .eml file with ‘attachment1.doc’ and ‘attachment2.doc’:

set commentString to ((filename of importedFile) as string) & ";" & ((reference URL of importedFile) as string) & "|" & commentString

This gives me attachment1.doc;x-devonthink-item://uuid1|attachment2.doc;x-devonthink-item://uuid2 in the Finder comment of the .eml file which gets process in the Python script in these parts:

comment = bplist.parse(xattr.getxattr(os.path.join(path, filename), 'com.apple.metadata:kMDItemFinderComment')).rstrip("|")

and

def get_replace_text(comment):
    """Return a message object to replace an attachment with."""
    replace_text = ""
    attachments = comment.split("|")
    for attachment in attachments:
        parts = attachment.split(";")
        print(parts)
        filename = parts[0]
        link = parts[1]
        replace_text = "\n\n<li><a href='{}'>{}</a></li>\r\n".format(link, filename) + replace_text
    return email.mime.text.MIMEText("<br/><br/><hr><br/><strong>Attachments:</strong><ul>{}</ul>".format(replace_text), 'html')

I’m content with the group storing the origional .eml file and attachments files
Why are we modifying the .eml file?

attachments as ‘proper’ DT records

The attachments work for me as part of the group, or as independent DT records

The .eml files encapsulate all the attachments. You won’t be able to find them as separate documents in in DT, and there you also can’t easily search ‘inside’ them. E.g. I’ve got loads of interesting documents (mostly PDF and Office documents) sent to me over the years. I can find them when I search for them, but I don’t know why the surface, because DT can’t look ‘inside’ the attachment (only the .eml file).

By separating the attachments I can treat them as ‘regular’ DT documents in regards to surfacing the content.

Modifying the original .eml is definitely not needed if you just want to include both the original .eml file and the attachment. But my e-mail archive spans 22 years and is ~35GB. Having all attachments in there twice is a bit too much so that’s why I’m doing it this way. Probably not needed for most, but works well for me.

1 Like

The attachments of emails are actually indexed since version 3.0, therefore a toolbar search should find the email. But the Search inspector supports only one document but not its attachments.

Yes, you’re absolutely right - sorry for not making myself clear. Why I’m doing it this way is because when I have e.g. an e-mail with a PDF attachment which contains a specific phrase I’m searching for, it does turn up in the search results, but I can’t find the occurrences without separately opening the attachment and redoing the search e.g. in Preview (see How to find search occurence in an e-mail attachment?). But maybe there’s an easier way I’m not seeing?

How are you getting the attachments a independent DT records? By running an Applescript like the one above or some other way?

That’s probably the easiest option currently.

Why don’t you just drag and drop the attachments out of the email into the database? They’re indexed and searchable as individual documents then. :thinking:

I’m using the script provided by DT (Add message(s) & attachments to DEVONthink.scpt)

Because my e-mail archive spans ~300.000 messages, so it would be quite some dragging and dropping :wink:

1 Like

True but I’m guessing you don’t need to import attachments from all 300,000 emails. :slight_smile:

Thank you @mdbraber for your work. I am recently “back” to DT and was hoping to use it to quickly wrangle a pile of .eml files with attachments. I copied your script from 5/11 to the DT Scripts folder [Library/Application Scripts/com.d-t.think3/Menu]. When I run it on a selected email in DT, it creates a group and places the message into the group, but does not appear to separate the attachments (PDFs) or include them in the newly created group. Am I missing a critical step? (I wandered into these chains last night and am trying to follow, but am not at the level of also figuring out python integration!)

Many thanks.

1 Like

Try running the script from the Apple Script Editor (separate app on your system) and see what the Debug output says (click the buttons 1 and 2 to get the debug output)

tell application "DEVONthink 3"
get selection
	--> {content id 7383 of database id 3}
path to temporary items
	--> alias "Macintosh HD:private:var:folders:m6:k_6qdnr56tv0n68zkm08p55w0000gn:T:TemporaryItems:"
get type of content id 7383 of database id 3
	--> unknown
get path of content id 7383 of database id 3
	--> "/Users/aaronhand/Documents/Misison Viejo Elections.dtBase2/Files.noindex/eml/6/Fw_ Public Disclosure - Public Consent.eml"
convert record content id 7383 of database id 3 to rich
	--> content id 7429 of database id 3
get reference URL of content id 7383 of database id 3
	--> "x-devonthink-item://18FECA34-9069-4AE9-B30C-96572BC0725C"
get type of content id 7429 of database id 3
	--> rtfd
get path of content id 7429 of database id 3
	--> "/Users/aaronhand/Documents/Misison Viejo Elections.dtBase2/Files.noindex/rtfd/2/Fw_ Public Disclosure - Public Consent.rtfd"
get parent 1 of content id 7383 of database id 3
	--> parent id 7425 of database id 3
get name of content id 7383 of database id 3
	--> "Fw_ Public Disclosure - Public Consent"
get modification date of content id 7383 of database id 3
	--> date "Tuesday, June 8, 2021 at 12:15:39 AM"
get creation date of content id 7383 of database id 3
	--> date "Friday, May 21, 2021 at 2:15:07 PM"
get addition date of content id 7383 of database id 3
	--> date "Tuesday, June 8, 2021 at 12:41:07 AM"
exists attachment of every attribute run of every text of content id 7429 of database id 3
	--> true
end tell
tell application "Finder"
	get POSIX file "/Users/aaronhand/Documents/Misison Viejo Elections.dtBase2/Files.noindex/rtfd/2/Fw_ Public Disclosure - Public Consent.rtfd"
		--> error number -1728 from POSIX file "/Users/aaronhand/Documents/Misison Viejo Elections.dtBase2/Files.noindex/rtfd/2/Fw_ Public Disclosure - Public Consent.rtfd"
	get every file of alias "Macintosh HD:Users:aaronhand:Documents:Misison Viejo Elections.dtBase2:Files.noindex:rtfd:2:Fw_ Public Disclosure - Public Consent.rtfd:"
		--> {document file "Deputy AG Grievance highlights.pdf" of document file "Fw_ Public Disclosure - Public Consent.rtfd" of folder "2" of folder "rtfd" of folder "Files.noindex" of document file "Misison Viejo Elections.dtBase2" of folder "Documents" of folder "aaronhand" of folder "Users" of startup disk, document file "TXT.rtf" of document file "Fw_ Public Disclosure - Public Consent.rtfd" of folder "2" of folder "rtfd" of folder "Files.noindex" of document file "Misison Viejo Elections.dtBase2" of folder "Documents" of folder "aaronhand" of folder "Users" of startup disk, document file "multi card.png" of document file "Fw_ Public Disclosure - Public Consent.rtfd" of folder "2" of folder "rtfd" of folder "Files.noindex" of document file "Misison Viejo Elections.dtBase2" of folder "Documents" of folder "aaronhand" of folder "Users" of startup disk}
	get document file "Deputy AG Grievance highlights.pdf" of document file "Fw_ Public Disclosure - Public Consent.rtfd" of folder "2" of folder "rtfd" of folder "Files.noindex" of document file "Misison Viejo Elections.dtBase2" of folder "Documents" of folder "aaronhand" of folder "Users" of startup disk
		--> "Macintosh HD:Users:aaronhand:Documents:Misison Viejo Elections.dtBase2:Files.noindex:rtfd:2:Fw_ Public Disclosure - Public Consent.rtfd:Deputy AG Grievance highlights.pdf"
	get POSIX file "/Users/aaronhand/Documents/Misison Viejo Elections.dtBase2/Files.noindex/rtfd/2/Fw_ Public Disclosure - Public Consent.rtfd/Deputy AG Grievance highlights.pdf"
		--> error number -1728 from POSIX file "/Users/aaronhand/Documents/Misison Viejo Elections.dtBase2/Files.noindex/rtfd/2/Fw_ Public Disclosure - Public Consent.rtfd/Deputy AG Grievance highlights.pdf"
	move alias "Macintosh HD:Users:aaronhand:Documents:Misison Viejo Elections.dtBase2:Files.noindex:rtfd:2:Fw_ Public Disclosure - Public Consent.rtfd:Deputy AG Grievance highlights.pdf" to alias "Macintosh HD:private:var:folders:m6:k_6qdnr56tv0n68zkm08p55w0000gn:T:TemporaryItems:" with replacing
		--> document file "Deputy AG Grievance highlights.pdf" of folder "TemporaryItems" of folder "T" of folder "k_6qdnr56tv0n68zkm08p55w0000gn" of folder "m6" of folder "folders" of folder "var" of item "private" of startup disk
		--> error number 0
	get document file "Deputy AG Grievance highlights.pdf" of folder "TemporaryItems" of folder "T" of folder "k_6qdnr56tv0n68zkm08p55w0000gn" of folder "m6" of folder "folders" of folder "var" of item "private" of startup disk
		--> "Macintosh HD:private:var:folders:m6:k_6qdnr56tv0n68zkm08p55w0000gn:T:TemporaryItems:Deputy AG Grievance highlights.pdf"
end tell
tell application "DEVONthink 3"
	create record with {name:"Fw_ Public Disclosure - Public Consent", type:group, modification date:date "Tuesday, June 8, 2021 at 12:15:39 AM", creation date:date "Friday, May 21, 2021 at 2:15:07 PM", addition date:date "Tuesday, June 8, 2021 at 12:41:07 AM"} in parent id 7425 of database id 3
		--> parent id 7431 of database id 3
	import "/private/var/folders/m6/k_6qdnr56tv0n68zkm08p55w0000gn/T/TemporaryItems/Deputy AG Grievance highlights.pdf" to parent id 7431 of database id 3
		--> missing value
end tell
tell application "Finder"
	get document file "TXT.rtf" of document file "Fw_ Public Disclosure - Public Consent.rtfd" of folder "2" of folder "rtfd" of folder "Files.noindex" of document file "Misison Viejo Elections.dtBase2" of folder "Documents" of folder "aaronhand" of folder "Users" of startup disk
		--> "Macintosh HD:Users:aaronhand:Documents:Misison Viejo Elections.dtBase2:Files.noindex:rtfd:2:Fw_ Public Disclosure - Public Consent.rtfd:TXT.rtf"
	get document file "multi card.png" of document file "Fw_ Public Disclosure - Public Consent.rtfd" of folder "2" of folder "rtfd" of folder "Files.noindex" of document file "Misison Viejo Elections.dtBase2" of folder "Documents" of folder "aaronhand" of folder "Users" of startup disk
		--> "Macintosh HD:Users:aaronhand:Documents:Misison Viejo Elections.dtBase2:Files.noindex:rtfd:2:Fw_ Public Disclosure - Public Consent.rtfd:multi card.png"
end tell
tell application "DEVONthink 3"
	move current application record content id 7383 of database id 3 to parent id 7431 of database id 3
		--> content id 7383 of database id 3
	delete current application record content id 7429 of database id 3
		--> true
end tell
Result:
true

If it helps, I cleared that temp folder and re-ran the script on the same item. It looks like the same output in Script Editor, though I did confirm that the attachment was extracted and saved into the temp folder.
Screen Shot 2021-06-08 at 10.18.26 AM

It’s not easy to debug remotely, but for some reason the “import” statement is returning “missing value” like it isn’t able to import the attachment. Maybe the Log or Activity window is showing some information of why the attachment can’t be imported?

Interesting – lack of permissions? I am able to navigate to that folder and interact with the file. Is MacOS somehow preventing DT or AppleScript from access the folder (even though the created the contents in it)?

I was in the process of granting DT full control when I knocked my Nalgene of water with perfect precision to irrigate my MBP. So I’ll helpfully report back one Day whether that fixes things — after we see about the logic board.

1 Like

DEVONtech don’t advocate that solution.

(On a more serious note: I wish you a speedy, preferably free recovery without loss of data. Thumbs pressed).