I’ve recently migrated from a 2018 Mac mini to the new M1 Mac mini using DT Pro 3.6.1. When choosing to import a large mbox mail, it starts to process, then crashes. I’ve tried 3 times and it always eventually crashes.
I tried this again on my older 2018 Mac mini (also using 3.6.1) and was able to import the same mbox file.
I’ve created bug report from DT Pro and sent it to cgrunenberg - at - devon-technologies.com (per their comment on one of the pref crash report on DT 3.6)
It attempts to determine emails as threads in a conversation and group each conversation in DEVONthink. However, conversation threading is also not a simple thing to do as there has never been any standard for this. Add this to the decades of unconforming emails and this is a difficult task for any application. (It’s also a subject of a lot of study and discussion, as it’s not an easy nut to crack with real assurances it’s accurate.)
I’m runing into the same problem with DT3.8.3 on macOS 12.3: grouping e-mail on an mbox with ~10K e-mails (600MB) it works fine, on an mbox with ~26K e-mails (1.6GB) it crashes. Is there a way to prevent crashing when I want to try grouping e-mails?
Btw: the option View > Sort by > Threads seems also to work reasonable well. Am I correct to assume that that option is only using the Name of an item, and the “Group threads” option tries to use more complex operations (e.g. using the In-Reply-To header etc?)
Unfortunately I can’t share this .mbox. It seems that it might not be related to the size of the mbox (I tried with a different, even larger mbox and it worked), but maybe to specific messages. I’ll see if I can do some bug hunting on my side.
Thanks for the pointers. There are indeed messages in that mailbox with the same Message-ID. Would those messages not be ignored when importing? I’ll check if other mailboxes which don’t show any problems also have double Message-IDs. Invalid references header could indeed also be the case, that’s a bit hard to find, but I can try chopping up the mailbox and see if I can find a / the conflicting message.
They’re skipped but in cause of grouping this might indeed cause problems. E.g. email B references A but there are multiple emails having the message ID of A. And even worse if one of these messages having the same message ID references B again. I’ll try to reproduce this using a patched mbox.
I can see all the threads being created but at the very end DT crashes and when I restart DT all threads / groups are gone. I’ll see if it might work (as a ‘hack’ to stop the process right before the very end, e.g. in the last minute)
I’ve eliminated all duplicated Message-IDs, but unfortunately a crash still occurs. I’ll see if I can narrow it down to something else.
EDIT: coincidence or not, but it seems that problems arise when mboxes contain > 21,000 messages. A mailbox of 20,500 still works without crashing, a mailbox of 21,100 doesn’t. For now the workaround is to split mbox files that are too big (they’re just text files, so you easily do it with a text editor like vim). The only trade-off is that threads that span both parts won’t be grouped together.
So far I couldn’t reproduce the issue but the number shouldn’t be a problem, it’s probably “just” the result of invalid message headers. Please check whether stripping either the In-Reply-To: and/or References: header fixes this.
Another possibility might be to replace the contents & email addresses in the mbox with dummy values and to send me a zipped copy, assuming that such a stripped version is still sufficient to reproduce the issue.
With one test mbox which crashed I’ve removed all duplicate Message-IDs - but the crash also happened on the edited mbox.
I’ve imported 22 mboxes with “Group Conversation Threads” enabled, these are my results:
14 mailboxes (the largest 20,500 messages) imported without problems
8 mailboxes (all > 20,500 messages, the largest 33,000 messages) crashed at the very last moment when creating the conversation thread groups
I’ve split all the 8 problematic ‘mboxes’ by using vim - going to 50% of the file, find the next message and split the mbox file in part A and part B. I then re-imported the 8 mailboxes as 16 different A+B mbox files. They all imported and created groups without a problem. I haven’t changed any (duplicate) Message-IDs or something else.
I will see if I can find a way to easily replace the message content and values. I probably have to write some (Python) code to do that, so it might take a bit of time. Thanks for your help so far!
Would you suggest to test this by stripping allIn-Reply-To: and References: headers in an mbox? That would be easy to test. But would that not mean that ‘group conversation threads’ would have nothing to do, as there simply isn’t any thread anymore?