Better email import

My understanding from reading the forums and looking at the suggestions is that the current mail import plugin cannot import HTML mail or attachments.

I would really love to see this feature in DT2.0, regardless of how difficult it is to do. I think it is essential in this day and age of thousands of attachments, and a ton of HTML mail that needs to be saved AS IS.

Thanks!

Sorry, but I disagree. It’s a better practice to make some early decisions rather than save everything indiscriminately. Just save attachments in a separate folder, called Attachments and sorted by date. To retain them in a mail database is risky. When you remove attachments from e-mail, the file names are recorded in the e-mail text, so you may locate them easily. With attachments stored separately, you may make backups or compress them in order not to eat up storage room. As for HTML mail, you may also save them as needed, but you’re wasting a lot of disk space on ads and styling.

Where did you read that? DEVONthink Pro Office imports mail messages completely, attachments and all. You can download and try the demo to see if it does what you want it to do. There is no need to wait until 2.0 (at least not for this feature).

With the DT Pro Office mail plugin, HTML mail (and text email) is archived as rich text, including (optionally if images haven’t already been downloaded) images.

Attachments are included, although they would have to be opened and saved in order to be imported and indexed.

The mail archiving scripts in DT Pro only capture the plain text content of messages, not images nor attachments.

EagleFiler is many times better than DevonThink in terms of email archiving imho: it keeps the original copy of the email and you may view the email in Mail.app directly (hence you are able to forward and such). It has however a few shortcomings (no direct import from Mail for instance, you have to slide and drop emails one-by-one, no ability to compress the archived emails, …).

As discussed in another thread, DEVONthink convert to RTF function is broken (although that’s Apple’s fault) for inline attachments. I’d just like DEVONthink to become a decent email archiver (so to have one archiver to rule them all), but for me it’s not there yet.

You do not have to drag messages one-by-one from Mail to EagleFiler. You can select multiple mailboxes (or messages) and press the capture key (F1), and EagleFiler will import them all in one go. You can also drag-import whole mailboxes.

I am confused about the email import capabilities of the DevonThinkProOffice 1.5.1 plugin, namely the differences in how import and import-all treat attachments. In the manual it says:

“If you want to import whole mailboxes including all sub-mailboxes,
you need to add the ‘Import All’ tool to the Import Mail window’s toolbar.
… Please note that this imports all messages of the selected
mailbox and in all sub-mailboxes contained in it without any filtering. It also does not check for unloaded attachments.”

what does it mean to say that it does not check for unloaded attachments and how is this different from how import treats “unloaded” attachments? What are unloaded attachments?

also in a previous post. A Devon Think person said

"With the DT Pro Office mail plugin, HTML mail (and text email) is archived as rich text, including (optionally if images haven’t already been downloaded) images. "

What is meant by downloaded images? Images downloaded in Apple Mail or DevonThink?

Also,
"Attachments are included, although they would have to be opened and saved in order to be imported and indexed. "

Opened an saved where. In Apple Mail or DevonThink Pro.
Not sure what this means.

I understand that in Apple mail if an attachment is edited that a copy is saved in the ~user/library/mail/mail downloads folder. How does DevonThink tread these attachments in the mail downloads. Are they imported into DevonThinkPro when using the Apple Mail Plugin?

It is very similar to how Mail works and displays its messages.

  1. Attachments may not be downloaded from the server in Mail (see the advanced configuration option in your Mail Account settings). If you browse in our Mail Import window through your mailboxes, this is checked because we need some of it for display purposes. On “Import All” that doesn’t happen since nothing is being displayed and thus the extra consistency checking is not enabled.

  2. Like Mail, we can prevent spammers from discovering their targets by preventing the loading of linked images that are referenced by a URL to a different site.

  3. Attachments are saved inside the RTFD document that you’ll find in DEVONthink. But DEVONthink doesn’t index attachments at the moment. So if you need to search data inside these, you’ll need to save them separately.

“1. Attachments may not be downloaded from the server in Mail (see the advanced configuration option in your Mail Account settings). If you browse in our Mail Import window through your mailboxes, this is checked because we need some of it for display purposes. On “Import All” that doesn’t happen since nothing is being displayed and thus the extra consistency checking is not enabled.”

The only relevant Apple mail setting I could find is in Preferences>Viewing , Display remote images in HTML messages.
It is my understanding that this setting does not apply to MIME attachments only URLs to images in html messages. Doesn’t apple mail always “download” file attachments? Are not these part of the original email as MIME attachments not embedded URLs.

I am assuming that extra DTP consistency checking is only for remote URL not MIME attachments. The source of my confusion as further supported by you answers is that there is some confusion in your documentation about what “attachment” means. Embbedded URLs in HTMLs are not attachments in the same sense that MIME attachments at the end of a EMAIL are. It would be better to make a clear distinction in your documentation.

My original question was how does DTP treat edited attachments saved in the Mail Downloads folder. Does DTP archive the version in the downloads folder or only the MIME attachment in the original email?

  1. Like Mail, we can prevent spammers from discovering their targets by preventing the loading of linked images that are referenced by a URL to a different site.

If I later browse an archived email in DTP will DTP be able to download
any images that were not downloaded before archiving?

When using “import all” does that mean that no URL images are archived or only ones that have been downloaded by apple mail. I assume that when apple mail displays an HTML email that it caches the downloaded images somewhere? Do your archive the cached images or do you re-download the images. Based on your answer you redownload the images for an “import” but not an “import all”? Or do you only redownload if the display remote images preference is checked.

I am still confused about what gets archived and when.

Response 3 did answer one of my questions however.

The Mail import process parses the original message, so you get exactly the attachments that were sent to you. And they will be inserted inline or as file icons depending on how it was specified in the MIME message. So what you see in Mail is what you get in DTPO.

That depends on the Apple internal conversion routines from HTML → RTF. We only prevent the actual download from taking place with this preference. You should normally get one of those question mark image icons.

And all this has nothing to do with the difference between Import (All). The import process will request the complete original MIME data from each message that Mail reports to be present in each mailbox and convert these for both button actions.

To summarise: the import process will do its utmost preserve all (MIME) data that is fit for human consumption.

Great, starting to converge.
So there is no difference between import and import all as far as
MIME attachments are concerned. All MIME attachments (fit for human consumption) will be archived.

So the difference must be localized to the download of remote html images.

So just to clarify, the difference between import-all and import is?

I am guessing here:

import-all does not download remote images in html messages under any circumstance?

import will download remote images in html messages if the DTP preference (download remote images in html messages) is checked and will not otherwise?

or does the Apple mail preference of the same name have an effect on the behavior?

There is no difference at all between both import processes as far as it concerns the contents of the messages.

If thats the case then what exactly do you mean by the sentence

“It also does not check for unloaded attachments.”
in your documentation?

Its seems we have come full circle.
First tried to clarify what “attachments” are in the phrase above
and then what “unloaded” means and what the implications of that are.
Now you are telling me that there is no difference? Or is just that it requires asking one of your programmers and that is too much trouble so you are going to pretend that there is no difference.

Frankly, I want to archive my email. But I want to know what information
may be lost in the process. It is more convenient to use Import All since it is less tedious, but if there is something lost vs Import then I need to know that in order to decide if I should use Import instead.

I can’t believe it is so hard to ask for clarification and definition of your documentation. Is this really how you want to treat your customers?

Annard is the programmer who wrote the code for the mail archiving procedures in DTP0. His responses in this thread were accurate.

The Mail archive procedures will by default honor your decisions about what you’ve told Mail to include in the display of messages.

Images: I’ve told Mail not to automatically download and display images. Why? Because spammers use image downloads to confirm that they have hit an active email account. Once confirmed, they often sell active addresses to other spammers. Spammers often include invisible images in what appears to be a text message.

Attachments: By default, Mail automatically downloads and includes attachments in messages. That’s what I want. If that is the way you have set up Mail, don’t worry about attachments. They will be included in your mail archives in your DTPO database.

When I’m archiving messages, I can instruct the plugin to go ahead and download images from the Internet. Obviously, that will take longer. And unless I’ve excluded junk mail and trash folders from the download, I would likely start getting more spam.

Most of my messages don’t include images. Some do, and if they are important I’ll open the message, download the images and save if to a database.

Bill, thanks for your post. It is helpful to see how someone uses the archiving feature. It gives me more confidence. In my case I am archiving mail that I have read. I have already eliminated any spam. These are important emails from business and other activities that I want to keep around as a reference. I have thousands of them and its becoming burdensome to have them all in mail.

Just to clarify, when you say that, if there is an email the has images you want to keep, that you open the email, download the images and save to the database, I assume you are opening it in mail not DTPO since DTPO also has the option of downloaded remote images under some (as yet to be clarified) circumstances.

But your response doesn’t specifically address my question.

The documentation makes of point of stating that there is a difference in behavior between “import all” and “import”. So either the documentation is
in error and there is no difference or Annard has chosen not to clarify the distinction. Saying there is more consistency checking just begs the question. He may feel that it is not worth his time to clarify, and that as a mere user I don’t need to know, because the difference is not important. He may well be right. But then why did the author of the documentation feel it important enough to make the distinction in the first palce?

My experience is that some software companies have little regard for their customers and if someone has a question or problem that can not be answered with a cookie cutter response then it is too much trouble. Other companies go to great lengths to answer and clarify knowing that this builds authenticity and a lasting relationship with the customer that spreads through word of mouth to other potential customers. Until now, I put Devon in this latter category.

Given the value I put in my email archives, the answer to this question is very important to me. Moreover after being a loyal customer for several years, I find is disappointing to be given the run around by one of the developers.

The fact that Annard is the programmer only means there is no excuse for not clarifying. It certainly doesn’t give me warm and fuzzies about trusting my email archives to DTPO.

-------------------Addressed to Annard ---------

If I have offended your intelligence, Annard, by asking pointed questions and or pointing out ambiguities in your answers, then I am sorry. It was not my intent to offend.
But I did mean to be direct and unambiguous in pointing out that the answers were not responsive to my questions and that it was upsetting to me a paying customer to be given non responsive answers. I have one set of expectations when a user answers questions on a company’s forum. I have another when an employee answers.

So Annard if you would be so kind as to clarify the distinction between Import and Import all as pointed out by the manual, I would be most grateful. It would give me confidence that I can entrust my valuable email archives to DTPO and reaffirm my high regard for Devon. I have a PhD in Electrical and Computer engineering and have spent 25 years designing and developing embedded software and hardware. So do not be concerned that your answer might be too technical for me. If anything, its the lack of technical specificity that is frustrating to me.

Look at the Advanced settings for your Mail account. Although the default behavior is to download attachments, you are given the option not to download attachments. That’s the setting noted by Annard concerning checks made during archiving.

I always want to download attachments. So the difference in the way Import and Import All check for attachments becomes moot, as Annard noted. When a message is archived, it will have the content – text, images (or not), attachments – that I would see in the original message when I display it.

Differences between Import All and Import discussed here are that the former neither provides the filter options provided by Import, nor does it check for undownloaded attachments.

But the bottom line is that if you use the default setting in Mail preferences to import attachments, you will have your attachments in messages archived to your database, whether you have used the Import or Import All mode for archiving.

As Bill said, being the developer, I am very careful about what I write since I wrote the code. However I tend to be terse. If that comes over to you as giving you a run around, I apologise.

So let me reiterate, despite what the documentation says (and we may need to update it now that I checked it to be sure because I really like to see happy customers):
There is no difference between “Import” and “Import All” where it concerns the contents of the imported messages (that includes any and all attachments).
The only difference is that, since by using the Import button the Mail Import window displays the contents of each mailbox for you in a table, a check is done if the count of the messages as reported for that particular mailbox correspond to the actual count of the displayed message list. That is the one and only difference.

When can this difference happen? If you do not locally cache the complete contents of a possibly inactive mail account on your local machine. Then Mail will cache only a few bits and pieces of each mailbox and has to download a message every time it is being displayed. So say it reports back that there are 11 messages in Inbox, but the last 2 came in before you displayed them in Mail. Now Mail will return 9 actual messages that are available locally. This happened to me during development when I had an old and unused mail account that was not active and I was using it for testing. So in all practicality, if you have an active account, this will not happen. But I would still advise you before the initial import of a whole tree of mailboxes to change the mail account advanced settings to cache everything locally and run a Synchronize Mailboxes command in Mail if you haven’t done so already.

As towards secure storage: I would always recommend to burn a backup of the original mailboxes from Mail that you want to archive in DTPO. And then burn a backup of your DEVON database afterwards. And create and store copies of these backups in different geographical places (preferably a bank safety deposit box that is water and termite free, so some banks in India are right out). Rinse and repeat. But that is because I’m extremely paranoid when it comes to data storage.

Many thanks Annard. I understand now. And thanks for clarifying that the documentation is misleading.

My email boxes are POP not IMAP so I believe the synchronize and cache concerns you describe do not apply. But it sounds like it might not be a bad idea to at least “rebuild” Apple mail boxes before archiving to DTPO.

I planned on also saving a separate apple mail archive in addition to importing to DTPO.

One more thing. If you revise the documentation, it might be good to include your suggestions for synchronizing IMAP and/or rebuilding POP mailboxes before using Import All.

Many thanks to all who contributed to the answer to this, as well as Annard and Bill for an interesting discussion!

I didn’t realize that the import mail feature was stripped down for the non-Office version, and was clearly reading the forums about the lesser version.

Looking forward to trying the Pro Office version, but I have one question. If I download and import my mail or scan some things into the DTDB using the Pro Office version, but decide to wait to put any more money down until 2.0 comes out, can I still use and see what I have imported? I.e. will the database be changed in any way that it can’t still be used with the non-Office version?

Thanks again!

John