Best way to store the "original filename" of a document?

I get many documents from email attachments or from web downloads. I import these documents into DEVONthink and rename them to reflect the content.

So for example if I get an invoice from the Coca-Cola company in an email, it might be called inv18654-20241011.pdf. After importing I rename it to “Coca-Cola Invoice 2024-10-11”.

Before I rename document I write into the comment field the old filename, so that I have (via this old name) a link between DEVONthink and my email program.

However, this is not ideal, because the comment field gets utilised for other purposes as well.

Is there another document metadata field for this information (preferably even a non-DEVONthink-proprietary file metadata field).

Or what other idea have people where to store the “provenance” of a document.

If you’re talking about PDFs: They have a title. If you’re talking about other formats: They may or may not have “metadata” at all. Eg, MultiMarkdown provides for metadata, standard MD does not. Word has metadata, Pages does. But for the latter two (and also for PDF): it makes very little sense to store the “name” of the document in there. This “name” is just an arbitrary sequence of bytes used by the local file system. It is not really “meta data” in that it conveys any information about the document.

Personally, I couldn’t care less how Deutsche Telekom or Coca Cola or my banks name their files before they send them to me or offer them for downloading. DKB, for example, names some of their information “2f7d5192-a6a3-4f2a-a6a6-db12adfe92fa.pdf”. What would that even tell me?

Am I going to search for these files on their servers? Hardly. Are they even using “files” to store this information? I hope they don’t.

If you insist, you can just define a DT custom meta data field to hold the “file” name and update that when you import the “file”.

2 Likes

Thanks fro your detailed reply!

I am here only talking about PDFs - and i want to keep the original filename “somewhere”, but not prominently.

It tells me the link to the original in my mail program.

Sometimes I see an old email asking myself “Did I import this document or not?[1], then it is good to be able to check whether this document is in DEVONthink – and searching for the original (often, though not always unique) filename is a good and quick indicator

Thanks, this might be good option. – I am just curious what others do if they have the same issue to record where a document comes from.


  1. I do have a tag in my mail prgram for “imported-in-DEVONthink”, but because I have to set this tag manually things go wrong. ↩︎

With document, you mean “PDF attachment”? Otherwise, DT never imports the same e-Mail twice. And for a document – well, there’s the “Duplicate” functionality. More reliable, imo, than a file name that can easily be changed.
I don’t rely on file names at all. And having them in a custom metadata field probably does not help with your issue.

Sorry for the late reply.

Well, I have PDF attachments in emails, and sometimes I want to check whether I have put this attachment into DEVONthink. – Easiest way is to search for the filename, because that name I can quickly copy from my email client and it is often reasonable unique (sometimes I need to add a topic word which I usually can remember without opening the attachment).

For example, I want to check whether a particular receipt from my iSP is in DEVONthink. The email has an attachment “Invoice_0123452024.pdf”, so I search for Invoice_0123452024 in DEVONthink and – voilà! – there it is if I have put the original filename in the comment field.

This works well, but better is some solution not utilising the comment field. Possibly I haven’t made it before clear enough, but that’s what my initial post is about.

If you’re not happy with the comment field, you can create and use a custom metadata field.

2 Likes

If I were you I will try matching the file size.

How do you see the file size in the email?

But I don’t even (easily) see the name of the attachment. Not in Apple’s Mail client, at least.

1 Like

Good question. I was thinking of the mail attachment class in Apple Mail’s scripting dictionary, which includes a file size property.

Custom metadata is your best option if you don’t want to utilize the Finder comments. In fact, why don’t you set up a smart rule that adds the filename to a custom attribute upon importing?

1 Like

Or what other idea have people where to store the “provenance” of a document.

You could store a link (to the message in Apple Mail) in the “URL” field and optionally add the original filename into the “Alias” field of the “Info” pane.

see: CREATING LINKS TO MESSAGES IN APPLE MAIL WITH SHORTCUTS AND APPLESCRIPT

1 Like

Scripts menu > More Scripts…

And modifiable by you, if desired.

1 Like

Thanks, but i don’t get it. It’s not about importing a mail but to tag the imported attachments with a link back to the original mail.

It’s an optional script so someone could import selected emails in Apple Mail and have the message ID in the URL, just as you mentioned…

You could store a link (to the message in Apple Mail) in the “URL” field

OK, there’s also your Script “Set Email URL to Message ID”. Both don’t quite fit here but it’s nice to know they are there. :wink:

Here is a SmartRule/Script combo that automates my proposed procedure:
Attach it to a special group in the inbox to avoid processing every import.

Now when a file is dragged over from Apple Mail and dropped into the group this should happen:

  • the original name is stored in the alias field
  • the domain of the sender’s E-mail address is added to the name
  • the message id of the mail is stored into the URL field
  • the file/record is moved to the inbox

Now when you want to check for the existence of a file/attachment you can do a specific alias search like you did it with the name. Or you can just use the script by John Gruber to copy the message id of your E-mail to the clipboard and paste that into the search box of DT.

The script ist not elaborated but will hopefully show the basic procedure!

image

on performSmartRule(theRecords)
	tell application id "DNtp"
		repeat with theRecord in theRecords
			-- brute force hack to get the (spotlight) metadata of the already imported file
			set theMetaData to paragraphs of (do shell script ¬
				"mdimport -d3 -n -t '" & path of theRecord & "' | awk -F'[;<>\"]' '/kMDItemOriginMessageID|kMDItemOriginSenderHandle/ {gsub(/.*:PR:.*= \"?/,\"\"); gsub(/.*@/,\"\"); print $1}'")
			-- quick check type/length
			try
				set theOriginId to item 1 of theMetaData as number
				set theOriginDomain to characters 1 thru -1 of item 2 of theMetaData as text
			on error
				return
			end try
			-- get message ID string from Apple Mail 
			tell application "Mail"
				set theSelection to selection
				if theSelection is not {} and id of item 1 of theSelection = theOriginId then
					set theMessageId to message id of item 1 of theSelection
				end if
			end tell
			-- set name and alias
			set aliases of theRecord to name of theRecord
			set name of theRecord to name of theRecord & " @" & theOriginDomain
			-- set URL only if theMessageId is defined 
			try
				set URL of theRecord to "message://" & "%3c" & theMessageId & "%3e"
			end try
		end repeat
	end tell
end performSmartRule
2 Likes

Thanks, @BLUEFROG. Good to know what the expert says. :smile:

Thanks for the smart rule. One question: Why do you set a tag for this. Wouldn’t it be enough to check whether the custom field is empty or not?

Wow, this is an interesting idea! Storing the original filename in the Aliases field! Sounds too simple to be good…

What are the ramifications of this approach?

Since I don’t use wiki links with aliases I don’t know … :wink:
You can also check the option “Exclude from Wiki Linking” to further safeguard this approach.

But keeping the original filename might be not necessary at all. If your attachments have the URL field populated a search for this message URL might be sufficient to see if the attachments have already been imported to DT.

1 Like

It was just an example of using a criterion to avoid matching items over and over again.

1 Like