Separate/import e-mail attachments for better search V2

Hi Bluefrog,

for me, it was nice to have the possibility to separate a mail from its attachment. This way, it was possible to have it searchable within Devonthink, directly accessible, but still linked to the original mail and not occupying 2 time storage.
The new function does not put the attachment into relation with the mail and also not removes it from the mail, so it is duplicated into the database, but without relation. This adds up quickly if you deal with attachments of 2-5 MB regularly.

2 Likes

DEVONthink 4.0 includes a new import attachments of record ... to ... AppleScript command.

ChatGPT helped me modifying the initial script. to work with DT4 here is the modified version

it expects the python script in the same location as the apple script

use AppleScript version "2.4"
use scripting additions
use framework "Foundation"

property ca : a reference to current application
property pythonCmd : "/usr/bin/env python3"
property replacedTagName : "attachments-extracted"

tell application "System Events"
	set scriptPath to path of (path to me)
	set parentFolder to POSIX path of (container of file scriptPath)
end tell

set pythonScriptPath to parentFolder & "/replace-attachments.py"

tell application "Finder"
	set replaceCmd to pythonCmd & " " & quoted form of pythonScriptPath & " "
end tell

tell application id "DNtp"
	set theSelection to the selection
	set tmpFolder to path to temporary items
	
	repeat with theRecord in theSelection
		repeat 1 times
			-- display dialog "Verarbeite: " & (name of theRecord)
			
			set recordPath to path of theRecord
			-- display dialog "Pfad: " & recordPath & return & "Typ: " & (type of theRecord as rich text) & return & "Tags: " & (tags of theRecord as rich text)
			
			if (type of theRecord is email or type of theRecord is unknown) and recordPath ends with ".eml" and (tags of theRecord does not contain replacedTagName) then
				try
					set foundAttachmentsJSON to do shell script replaceCmd & (quoted form of recordPath)
				on error errMsg
					display dialog "Fehler beim Python-Skript:" & return & errMsg
					exit repeat
				end try
				
				if foundAttachmentsJSON is equal to "" then
					display dialog "Keine Anhänge vom Python-Skript erkannt."
					exit repeat
				end if
				
				set foundAttachments to my fromJSON(foundAttachmentsJSON)
				-- display dialog "Gefundene Anhänge: " & (foundAttachments as rich text)
				
				set recordReferenceURL to reference URL of theRecord
				set recordSubject to name of theRecord
				set recordModificationDate to modification date of theRecord
				set recordCreationDate to creation date of theRecord
				set recordAdditionDate to addition date of theRecord
				set recordGroup to missing value
				set extractedAttachments to {}
				
				set rtfRecord to convert record theRecord to rich
				-- display dialog "RTF-Konvertierungstyp: " & (type of rtfRecord as rich text)
				
				if type of rtfRecord is RTFD then
					set rtfPath to path of rtfRecord
					
					tell rich text of rtfRecord
						tell application "Finder"
							set rtfAttachmentList to every file in ((POSIX file rtfPath) as alias)
							-- display dialog "Anzahl Dateien im RTF: " & (count of rtfAttachmentList)
							
							repeat with rtfAttachment in rtfAttachmentList
								set rtfAttachmentName to name of rtfAttachment as string
								-- display dialog "Datei im RTF: " & rtfAttachmentName
								-- display dialog "Vergleiche:" & return & "RTF-Datei: " & rtfAttachmentName & return & "JSON-Anhänge: " & (foundAttachments as text) & return & "RTF (klein): " & my lowercaseText(rtfAttachmentName)
								set nameFound to false
								repeat with itemName in foundAttachments
									if my normalizeText(rtfAttachmentName) = my normalizeText(itemName) then
										set nameFound to true
										exit repeat
									end if
								end repeat
								
								if nameFound then
									-- display dialog "TREFFER: " & rtfAttachmentName
									-- ab hier: move, import usw.
								end if
								if my lowercaseText(rtfAttachmentName) is in (my lowercaseList(foundAttachments)) then
									-- display dialog "TREFFER: " & rtfAttachmentName
									
									set rtfAttachment to move (rtfAttachment as alias) to tmpFolder with replacing
									
									tell application id "DNtp"
										if recordGroup is missing value then
											set recordGroup to create record with {name:recordSubject, type:group, creation date:recordCreationDate, modification date:recordModificationDate, addition date:recordAdditionDate} in (parent 1 of theRecord)
										end if
										
										set movedPath to POSIX path of (rtfAttachment as alias)
										-- display dialog "Importiere Datei: " & movedPath
										set importedItem to import path movedPath to recordGroup
										set URL of importedItem to recordReferenceURL
										set modification date of importedItem to recordModificationDate
										set creation date of importedItem to recordCreationDate
										set end of extractedAttachments to {rtfAttachmentName, ((reference URL of importedItem) as string)}
										-- log message "Importiert: " & rtfAttachmentName info "Anhangsextraktion" record importedItem
									end tell
								end if
							end repeat
						end tell
						
						if (count of extractedAttachments) > 0 then
							set extractedAttachmentsJSON to my toJSON(extractedAttachments)
							
							tell application id "DNtp"
								move record theRecord to recordGroup
								do shell script replaceCmd & "-r " & quoted form of extractedAttachmentsJSON & " " & quoted form of recordPath
								set tags of theRecord to (tags of theRecord) & {replacedTagName}
								-- log message "Anhänge ersetzt in: " & recordSubject info "Anhangsextraktion" record theRecord
							end tell
						end if
					end tell
					
					delete record rtfRecord
				else
					display dialog "RTF-Konvertierung hat kein RTFD geliefert."
				end if
			end if
		end repeat
	end repeat
end tell

on normalizeText(t)
	-- Entfernt fĂźhrende/trailing Whitespace und wandelt in Kleinbuchstaben
	set cleaned to do shell script "/bin/echo " & quoted form of t & " | tr '[:upper:]' '[:lower:]' | sed 's/^ *//;s/ *$//'"
	return cleaned
end normalizeText

on fromJSON(strJSON)
	set {x, e} to ca's NSJSONSerialization's JSONObjectWithData:((ca's NSString's stringWithString:strJSON)'s dataUsingEncoding:(ca's NSUTF8StringEncoding)) options:0 |error|:(reference)
	if x is missing value then error e's localizedDescription() as text
	if e ≠ missing value then error e
	if x's isKindOfClass:(ca's NSDictionary) then
		return x as record
	else
		return x as list
	end if
end fromJSON

on toJSON(theData)
	set theJSONData to ca's NSJSONSerialization's dataWithJSONObject:theData options:0 |error|:(missing value)
	set JSONstr to (ca's NSString's alloc()'s initWithData:theJSONData encoding:(ca's NSUTF8StringEncoding)) as text
	return JSONstr
end toJSON

on lowercaseText(t)
	return (do shell script "/bin/echo " & quoted form of t & " | tr '[:upper:]' '[:lower:]'")
end lowercaseText

on lowercaseList(theList)
	set outList to {}
	repeat with i in theList
		set end of outList to my lowercaseText(i)
	end repeat
	return outList
end lowercaseList

What is the intention of the repeat 1 times “loop”? And why all this stuff instead of using the command @cgrunenberg suggested?

1 Like

The script does more than just importing something. It takes an already imported .eml, scans it for attachments, imports them into the database, deletes them from the .eml file, and places a link to the file in the .eml and the URL of the mail in the URL field of the file to link them together.

Actually, I don’t have a clue if this is the most efficient way of doing this. Probably not. I just took the old script from mdbraber for DT3 and made it work again with DT4.

If this functionality makes it directly into DT, I would be happy to use it.

Well, it is so convoluted that the intention and algorithm are difficult to recognize. Eg: tell finder to set a string to a value? The repeat 1 loop? Converting an email to RTFD so that a python script can access the attachments saved in the conversion process? Lots of smoke for a tiny fire. And some comments would greatly help to understand the code.

Why not something like (in symbolic code)

repeat for r in selected records
  if (type of r is email or (type of r is unknown and extension of r is '.eml'))
     import attachments record r to target someGroup
     post process the attachments in whatever way
     set tag of record to "has been processed"
  end if
end repeat

That takes, of course, the fun out of it (convert EML to RTFD, read RTFD as JSON, parse JSON, lowercase whatever …). But right now, it looks like a very, very convoluted way requiring many tools (Python and several of its modules) to solve a not so complicated problem.

1 Like

I fully understand you. I would also prefer a more integrated solution and the script is not mine and I am also not a very skilled programmer especially Apple Script for me is like witch craft. It reads so easy, but to actually get it working is so random in my opinion. And I would have never been able to debug this without ChatGpt or a similar tool. At least not in an acceptable time/afford.

I kept it alive, because it scratches an itch for me, but if the DT developers integrate a functionality to separate and remove attachments from a mail and link them together I would be absolutely happy to simplify this. Just importing and not deleting from the original mail doesn’t solve the problem for me.

regarding the loop 1 this was written in the original script the comment got lost somewhen.
I don’t know what this is good for.

Thank you @AWD and @mdbraber for this great work!

I’ve tried getting it to run, but I always stumble on some permission issues: Finder asks me to grant it some higher privileges with Touch ID, which I grant, but apparently I don’t grant enough, since I get the error:

	move alias "Macintosh HD:Users:me:Datenbanken:EmailsDevonThinkDB.dtBase2:Files.noindex:rtfd:e:Einladung.rtfd:bild.JPG" to alias "Macintosh HD:private:var:folders:6b:98zkkkjs1tl4xwid31r84woh0000gn:T:TemporaryItems:" with replacing
		--> current application
		--> error "Die Aktion konnte nicht abgeschlossen werden, da du nicht die erforderlichen Zugriffsrechte hast." number -5000

I gave DT4 full disk access, but I suspect there is no link between the two, since this is in the Finder part.

If you have any idea how to get it to work, that would be great!

Just for the record, since the question appeared in the conversation, here are my reasons why I’d like to use this script instead of the built-in functions (and I’d much rather have the built-in functions do it than fiddle with several scripts):

  1. Separate attachments are treated with OCR and are therefore searchable
  2. With the treatment of this script, there is a backlink between the file and the email (not with the inbuilt function)
  3. In addition, the email and the attachment are grouped together (not with the inbuilt function)
  4. When the attachment gets actually removed from the eml, this allows for space savings through deduplication (e.g. all the forwarded mails with attachments will otherwise take up much more space)
  5. Search works better in separate attachments, even indexed ones (shows where the search text was found)
  6. The file names of separate attachments can be found when searched for (currently, the filenames are not indexed when inside a message)

Have a great day!

FYI, I also created a feature-request:

Hi @smiling, did you run this instance in script editor? If so it might be that you have to grant script editor full disk access for the time of testing this script.

Thank you @AWD ! That was it! I had thought of it, but since the script editor is in a subfolder of the programs folder, I thought that maybe it wasn’t possible to give the script editor full disk access. Now it works perfectly!

By the way, for anyone coming across this post, if you get an error

tell application "DEVONthink"
	import path "/private/var/folders/6b/(…).pdf" to parent id 83694 of database id 2
		--> missing value
Ergebnis:
error "„URL of missing value“ kann nicht als „\"x-devonthink-item://%(…)\"“ gesetzt werden." number -10006 from URL of missing value

then it’s because DT doesn’t have full disk access.

Now I’ll try to see if I can make a workflow that works for me.

All the best!

Current state: it works very well on some emails, on others the python script somehow doesn’t detect an attachment:

INFO No attachments found to replace

Have yet to debug the code more in depth, no idea why it’s happening.

The python script has a threshold for minimum size of an attachment. By default it is set to 150 kB. This can be the issue. If not: For the cases where it fails what was the file type of the attachment?

Thank you @AWD, I hadn’t realised there was a threshold. The threshold being only valid for images, it wasn’t the reason.

The reason was that by default, the python script only takes “real” attachments, not inline-attachments. And in the cases where it didn’t work, the attachments were inline attachments. By adding ‘inline’ on line 105, the issue was resolved:

if not part.get_content_disposition() in ['attachment', 'inline']:

Have a great day!

Why would you want to extract and save an inline attachment? It is inline so that you can see (in general) an image directly in the e-mail. Often, these are company logos and such.

1 Like

Apparently, some senders also put PDF and other files as inline attachments. I indeed don’t need company logos to be extracted, but that’s taken care of by the image filter, that filters out images below a certain size.

Weird. The idea of an inline attachment was (I think) that the client could display it directly. Which is difficult to imagine with a PDF. But e-mail is weird anyway.

Apple Mail on the Mac does this by default. Really annoying.

2 Likes

Indeed. I think it does with single page pdfs. And the option cannot be modified afaik…

1 Like

There used to be a Mail plugin which would make Mail attachments work like “dumb” attachments, but that was many years ago, and Apple likely blows the world up or whatever these days if you ask about Mail plugins.

Sean