I have been using the Google platform for my businesses since pretty much the beginning of when Google started providing them.
Some time ago I used Google Takeout to download and archive materials from one of those businesses. In this particular case I have 10 years’ worth of emails and attachments.
Google Takeout creates an mbox
-format email archive for downloading. Conveniently Devonthink 3 provides a way to import mail in mbox
format. My only reservation was the sheer scale of the archive:
- Google Takeout
mbox
= 14.04 GB on disk - 65,539 emails and attachments
From experience, most software import functions/tools break when faced with large amounts of data to process. Even the more robust ones might face issues when your computer inadvertently goes to sleep, or resources disappear (e.g. disks get unmounted) during the process. Usually the import process is unrecoverable and you have to start again. If you have tried opening a large Microsoft Entourage archive (back in the day!) or importing tens of thousands of photos into Lightroom you’ll know what I am talking about.
So it was with apprehension (and curiosity) that I created a new empty database, and imported the Google Takeout mbox
. (File > Unix Mailbox…, selecting the .mbox
file). This was my set up:
- Google Takeout
mbox
was on a SanDisk SSD connected to my 2018 13" MacBook Pro via USB-C. - A new Devonthink database was created on the same volume (disk) to remove the variable of copying across volumes although I think it shouldn’t make a difference.
- Closed my biggest databases in Devonthink just in case, although I don’t think it makes a difference.
Results
Well, it worked! The import worked flawlessly (except the post-import grouping – see below). Here are some stats:
- Time elapsed: 25 hours (excluding processing to group emails – see below), but CPU time was 15h. Likely this is because the laptop went into sleep mode occasionally (and in particular overnight) and the import process resumed gracefully when it woke up.
- You can prevent your Mac from going to sleep by using
caffeinate
(caffeinate Man Page - macOS - SS64.com), but I decided not to as an experiment.
- You can prevent your Mac from going to sleep by using
- 2 emails had “Couldn’t extract text from mail message” errors. That’s an error rate of 0.00305%.
- The
dtBase2
file ended up being 17 GB – 3 GB larger than thembox
. - Devonthink took up an average of 70% of CPU time throughout the import process. It used mainly 2 of the 4 cores. It played nicely with other resident apps: I had many apps open (DayOne, PyCharm, Obsidian, Safari, iTerm2, Mail, iMessage, Notes) and was still able to work normally.
- Devonthink took up an average of 8.5 GB of RAM (out of 16 GB) with just over 20 threads running.
One other variable here was that I had set up a Smart Rule some time ago to automatically prefix any imported email’s filename with the date and time the email was created. This is so that I can quickly see when any email was transacted. (Not strictly necessary as you can open a “Date Created” column.) Well, the rule processed each of the 65k+ emails that was imported. So in fact, if you were to do this on an identical set up without the rule, the import will likely be quicker.
But I am impressed that the added complexity of the rule did not hamper the reliability of the import tool.
Once the emails were pulled into the database, the importer started to group emails into threads. This ran for about 3 hours and then Devonthink crashed. Grouping is not important for me as I use search to find emails. (Of course, this can be turned off in the Email
tab of Devonthink’s Preferences
.)
Conclusion
Devonthink itself is a great tool to manage archived emails. This is a whole topic on its own.
If you have large Google Takeout (or other mbox
format) mail archives, Devonthink provides a reliable import tool, at least in my case up to 65,539 emails. Kudos to the team for baking reliability into the import tools. (I had a similarly great experience importing my more-than-a-decade-large Evernote database many moons ago).
For those of us who depend on Google for email, this is great because it gives us the option of moving our old emails away Google’s servers into a workable archive on Devonthink. My nightmare scenario is a single point failure, such as being accidentally locked out from my Google account, and losing access to years of history and information (not to mention all my Google Docs, web domains, GCP VMs, Firebase projects etc etc – but that’s another story).