Email import and link between emails and attachments — how to see where an attachment came from

Hi there,

New to DT, I just managed to import some emails from an mbox-file into a fresh database. I’ve read a bit on attachment handling and have the impression some older posts are not relevant any more since some things have changed it DT4, so I hope my question is not total nonsense.

I noticed DT imports the email and the attachments separately, and somehow links the two. That is really great, since it allows deduplicating attachments that appeared in several emails.

However, I don’t really understand when there’s a link from the file to the email and when not. I have the impression there is a link for embedded images, but not for other attachments? Here are two examples, one for a PDF (no backlink, at least not where I looked, in the “mentions” section"), and one for an image:


Edit: I also saw the graph view, which somehow links to the original email, but also to other, non directly related emails, so that’s not really ideal.

Is there another way to see to which email(s) an attachment belongs?

Have a great day!

That is not a link between the attachment and the email. It is a mention just meaning the email has mentioned the name of the attachment(s) in its content. Also, the Graph inspector is similarly just showing mentions or possible See Also connections between the email and the attachments.

Thank you for your help.

OK, I understand. What is not clear to me, is why an image in the content counts as a mention, and a “normal” attachment doesn’t. Is there any other way to get a trace back to the original email?

I’m not sure if I follow you: I tried to search (online and in the forum) for that sentence, but didn’t find anything.

In the meantime, I tried other things to find a “backlink” and noticed two things:

  1. In the email, the attachment has an extension. If I search for that name exactly (nameofmyfile.pdf), I don’t find anything in the database. I have to search for nameofmyfile without the extension for it to show the attachment.
  2. With or without the extension, searching for the name of the attachment never brings up the associated email.

Is that how it’s supposed to behave, and is there a workaround for it to behave differently?

It may have a suggested filename.

There’s a DT preference determining how filename extensions are handled. Check that out.

1 Like

Is the attachment the only content in the email?
Look at the Concordance inspector.

Also, why do you want the attachments separated? That is actually consuming more space as the email isn’t imported with the attachment stripped out.

1 Like

Thank you for your help @BLUEFROG and @chrillek .

I’m sorry but I don’t understand. The name of the file and the full filename are identical but for the extension, and I can’t see any other option that would have been modified (I didn’t add an alias or anything else). When searching for the filename without the extension, I get exactly one result, the file. When searching with the extension, I get 0 results.

Again, I’m sorry, but I’m not sure what you mean. I searched through the documentation in search of a setting that would come close to what you mean, but I didn’t find anything.

I have some emails where it’s nearly the case. Most of the time, there is some content with it.

Thank you. In my case, it was less helpful than the graph.

I’m not exactly certain of what I want, given all the options that are available in DT. :slight_smile: In the settings, I activated the import of attachments, since at the time where I checked that checkbox, I thought simply that it meant that attachments would be imported together with the emails.

Now, I would really like the attachments to be searchable, and I understand they are searchable when they are separate, not when they’re inside the eml.

Another positive thing would be to be able to do deduplication, if an attachment appears in several emails.

However, if moving the attachment “out of the email” is actually duplicating it, well then that’s another story. I thought the attachment I saw in the email is actually a link to the item in DT that has been detached.

With regards to the search issue, is it wiser to open a separate thread?

Thanks for your help and all the best!

You’re welcome.

Now, I would really like the attachments to be searchable, and I understand they are searchable when they are separate, not when they’re inside the eml.

If an attachment in an email is indexable, it will be part of the searchable text of the email.

Another positive thing would be to be able to do deduplication, if an attachment appears in several emails.

As mentioned, the email is imported intact, including the attachment, so if the same attachment is in several emails, there is no deduplication. Similarly, importing the attachments separately won’t minimize duplicate attachments either.

With regards to the search issue, is it wiser to open a separate thread?

That was my point about the Concordance inspector. Do you see the filename of the attachment in the Concordance for the email? I’m suspecting the answer is no.

Also, @cgrunenberg would have to comment on this but since image attachments in emails aren’t indexed, it’s quite possible the filename is included in the searchable text for the email. For indexable content, the filename isn’t.

1 Like

The filename isn’t indexed so far.

1 Like

I misunderstood your post. Just wanted to point out that afaict there’s no requirement for an attachment to have a filename. With or without extension.

1 Like

Thank you for your help!

OK, thank you, good to know. I misunderstood what is written in the documentation on page 78 then.

Imported email messages (.eml) have their contents indexed (excluding the contents of
attachments.

I tested and it worked, for indexable attachments. Where it doesn’t work is for PDFs without OCR. These can be searched when they’re separate elements, but not when they’re inside the message (since DT is not performing OCR on the attachments I suppose).

OK, thank you. So the only way to achieve this (deduplication) + OCR of the attachments, would be to deactivate importing the attachments in the settings (which actually doesn’t deactivate importing the attachments, but only deactivates importing them as separate items?), and then use a script such as this one ? Separate imported e-mail attachments for better search

Indeed, it’s not there.

Oh, that’s sad. Is there a reason for that? If not, I’d like to suggest it as a minor improvement that can certainly be helpful in many situations.

Thank you, good to know!

OK, thank you, good to know. I misunderstood what is written in the documentation on page 78 then.

I will fix and clarify this for the next release.

Where it doesn’t work is for PDFs without OCR. These can be searched when they’re separate elements, but not when they’re inside the message (since DT is not performing OCR on the attachments I suppose).

That is correct. No text layer on attached PDFs means no indexing.

would be to deactivate importing the attachments in the settings (which actually doesn’t deactivate importing the attachments, but only deactivates importing them as separate items?), and then use a script

There’s no need to use a script. Select the email with an attachment in your database. Then either drag and drop the attachment out to the item list or use Tools > Import Email Attachments.

PS:

1 Like

Thank you, I tried that. However, I don’t see a group being made with the attachment somehow “related” to the parent email. That’s why I thought a script is the only option, but maybe I’m not using it right.

It doesn’t make a group. Yes, you could use a script if you wanted it to be grouped. In fact, if you were handling things inside Apple Mail instead of dealing with giant .mbox files, you could select an email in Mail and use the Add message(s) and attachments script in the menubar there.

1 Like

Thank you!

That’s what I plan to do in the future, once the big import has worked, for all subsequent emails.

Looking at it with more calm, I actually see two more differences. To quote @mdbraber who wrote the script, there are at least those advantages over the inbuilt solution:

The backlink being what made me write this post in the first place.

In addition, if the Python script mentioned here Separate imported e-mail attachments for better search - #4 by mdbraber still works, then I can also remove the attachment from the eml and save space by replacing all the duplicates with replicants.

However, I don’t know if the abovementioned scripts still work in DT4.

Edit: there’s a new version of the script, which @AWD seems to have made compatible with DT4:

I’d contact the script’s author, @mdbraber.

1 Like