Archiving emails from a massive Google Takeout in Devonthink 3

I have been using the Google platform for my businesses since pretty much the beginning of when Google started providing them.

Some time ago I used Google Takeout to download and archive materials from one of those businesses. In this particular case I have 10 years’ worth of emails and attachments.

Google Takeout creates an mbox-format email archive for downloading. Conveniently Devonthink 3 provides a way to import mail in mbox format. My only reservation was the sheer scale of the archive:

  • Google Takeout mbox = 14.04 GB on disk
  • 65,539 emails and attachments

From experience, most software import functions/tools break when faced with large amounts of data to process. Even the more robust ones might face issues when your computer inadvertently goes to sleep, or resources disappear (e.g. disks get unmounted) during the process. Usually the import process is unrecoverable and you have to start again. If you have tried opening a large Microsoft Entourage archive (back in the day!) or importing tens of thousands of photos into Lightroom you’ll know what I am talking about.

So it was with apprehension (and curiosity) that I created a new empty database, and imported the Google Takeout mbox. (File > Unix Mailbox…, selecting the .mbox file). This was my set up:

  • Google Takeout mbox was on a SanDisk SSD connected to my 2018 13" MacBook Pro via USB-C.
  • A new Devonthink database was created on the same volume (disk) to remove the variable of copying across volumes although I think it shouldn’t make a difference.
  • Closed my biggest databases in Devonthink just in case, although I don’t think it makes a difference.

Results

Well, it worked! The import worked flawlessly (except the post-import grouping – see below). Here are some stats:

  • Time elapsed: 25 hours (excluding processing to group emails – see below), but CPU time was 15h. Likely this is because the laptop went into sleep mode occasionally (and in particular overnight) and the import process resumed gracefully when it woke up.
  • 2 emails had “Couldn’t extract text from mail message” errors. That’s an error rate of 0.00305%.
  • The dtBase2 file ended up being 17 GB – 3 GB larger than the mbox.
  • Devonthink took up an average of 70% of CPU time throughout the import process. It used mainly 2 of the 4 cores. It played nicely with other resident apps: I had many apps open (DayOne, PyCharm, Obsidian, Safari, iTerm2, Mail, iMessage, Notes) and was still able to work normally.
  • Devonthink took up an average of 8.5 GB of RAM (out of 16 GB) with just over 20 threads running.

One other variable here was that I had set up a Smart Rule some time ago to automatically prefix any imported email’s filename with the date and time the email was created. This is so that I can quickly see when any email was transacted. (Not strictly necessary as you can open a “Date Created” column.) Well, the rule processed each of the 65k+ emails that was imported. So in fact, if you were to do this on an identical set up without the rule, the import will likely be quicker.

But I am impressed that the added complexity of the rule did not hamper the reliability of the import tool.

Once the emails were pulled into the database, the importer started to group emails into threads. This ran for about 3 hours and then Devonthink crashed. Grouping is not important for me as I use search to find emails. (Of course, this can be turned off in the Email tab of Devonthink’s Preferences.)

Conclusion

Devonthink itself is a great tool to manage archived emails. This is a whole topic on its own.

If you have large Google Takeout (or other mbox format) mail archives, Devonthink provides a reliable import tool, at least in my case up to 65,539 emails. Kudos to the team for baking reliability into the import tools. (I had a similarly great experience importing my more-than-a-decade-large Evernote database many moons ago).

For those of us who depend on Google for email, this is great because it gives us the option of moving our old emails away Google’s servers into a workable archive on Devonthink. My nightmare scenario is a single point failure, such as being accidentally locked out from my Google account, and losing access to years of history and information (not to mention all my Google Docs, web domains, GCP VMs, Firebase projects etc etc – but that’s another story).

8 Likes

Thanks for sharing the experience and the very kind comments!
While you do have a large number of emails, processing the .MBOX is much, much easier than trying to message back and forth between applications.

I’m also in the process of archiving (off of Google) many many years of Gmail, for the same reasons as Luminary99_0. However, I decided against using Google Takeout to create a huge mbox file because I wanted to be able to automatically and incrementally update the archive over time, without having to repeatedly re-download the same emails (in my case, over 522,000 emails in 11G). I researched a number of possible solutions, and settled on Mail Archiver X (https://www.mothsoftware.com). I first trialed it on a much smaller account, and it worked fine, so I’m in the process of downloading my major account (it’s about half-way through it’s initial download of the email account via IMAP). In some cases, you’re better off with a targeted solution for a problem rather than adapting a more general-purpose program like DT (as much as I like DT).

1 Like

Please choose Help > Report Bug while pressing the Alt modifier key and send the result to cgrunenberg - at - devon-technologies.com - thanks in advance!

@amalis After reading this thread, and struggling to get DT to import some large email accounts (on Mac and Gmail), I decided to follow your lead on Mail Archiver X.

May I ask - did you find a good way to get the emails it downloads over to DevonThink? Or are they retained in the app, and maybe Filemaker?

@PaulJ I’m just keeping them in Mail Archiver X in the native format. I don’t use Filemaker. I don’t plan to export them to DT, although if I wanted to, Mail Archiver can export to mbox format, which DT can then read.

1 Like

@amalis Thank you. I do plan to feed an Export from Mail Archiver X over to DevonThink - probably using its scheduler, and using Hazel to automate the process. I’m currently assessing whether the best file format would be mbox or PDFs, once inside DT.

In case I have to do a several hour or day long task, I use Theine app to prevent my iMac or Macbook from going to sleep. The app is available in Apples app store.

FWIW keeping your Mac awake can also be done by an app called Caffiene (free I think) or a KeyBoard Maestro script.

I think there is a better way to perform sweeps of email into Devonthink on an on-going basis if you are still actively using your Gmail account. Devonthink 3 Pro has a fantastic Import feature that allows you to peer into the live database of Apple mail.

So the first pre-requisite is that your gmail account must be enabled in Apple mail.

Then, in the menu bar, select View > Import. The sidebar (left most column) will now show all your individual folders in Apple mail. Plus there are handy filters so you only see mail that have not yet been imported. Once you have found your target emails, a single click is all that’s required.

If the sidebar is active you can also select the import view by clicking on the 3rd icon from the left at the top of the sidebar:

DT3 Import Sidebar

2 Likes