DT Pro crashing on building large email archive

I purchased DT Pro to have a better archive tool for a decade of email and attachments. Previously I was using Eaglefiler and between EF and Apple mail had 11 years worth of files and attachments.

So, importing from Apple Mail to DT seemed to work well. Then I moved files from EF into Apple Mail and imported those into DT. This got to be a pain. However, I made it work. Finally needed to import all (55k) sent messages. This is where I get stuck.

DT imported about half the files and then crashed. Tried again, another crash. Deleted the files from Apple Mail that were already imported and tried again. Imported about half the remaining files and crashed. Tried various options and it seems to be working and then - crash again.

The current size of the DT database if 14.7G. I don’t know if I am running into a size limitation or what. Basically stuck. Filed a report with Devon technologies and other than the automated reply have heard nothing.

This is being done on a 17in macbookpro with 8G Ram and 1T harddrive.

Any suggestions would be greatly appreciated.

Thanks,
Mike

The size of the file is not as important as the number of items. Christian recommends:

The filesize doesn’t matter, only the number of items (not more than 200000 recommended) and total number of words (not more than 300 million recommended) are important.

Thanks. Good to know what the suggested limits are.

Currently the db has 142k items and 73M total words. Clearly not close to the limits. The additional items would bring the count to about 155k items and no where close to the word limit.

Mike

Thanks, that’s good to know. I didn’t expect any limit at all but size of hard drive and memory and will run into problems with these restrictions sooner or later.

Which brings me to a question: these limits are per what? everything? 200k files per database? per open database?
In other words, can I get DT to work with more files by splitting them up into several databases?

And if one approaches or even exceeds these limits, what consequences does it have? Will it just slow down DT significantly, or will it stop working altogether, or will it start crashing more and more, or …?

Having just read that another user claims to have almost 200 times as many files in his database:
http://www.devon-technologies.com/scripts/userforum/viewtopic.php?t=12425&p=58137#p58137

I would like to bring the attention back to my previous questions.

All,

I have finally imported all my emails. To make this work I had to choose emails from the folder picking 2000 or so at a time and them import using the import to DTPro command in the Mail- >Message Menu.

Doing this same thing with larger number of files,say 4000, caused DT to crash. I sent logs to Christian and he said there was no clear problem pointing to DT. The large size of the db 15G, 8G of RAM, and DT currently being 32bit all may be contributing to the problem.

Nonetheless, All the files are in and the search seems to be working reasonable well even on the entire corpus. I look forward to the 64bit version later this year.

Mike

EagleFiler stores emails in mbox files. There was no reason to import them back into Mail only to transfer them in DT. I suspect it would have been a lot more efficient to import the mbox files directly (File/Import/Email/select Unix Mailbox/navigate to the EagleFiler library and select the mbox files). I’ve got more than 100,000 emails in both EagleFiler and DEVONthink and all were imported as mbox files with about 15,000 emails per file. It all worked great. It would be interesting to know if importing the mbox files would have prevented your crashes, but I imagine you’re sick of dealing with it at this point!

Good point. At the time I thought that to get the EF mail into the appropriate folder structure in DT I needed to import back to Mail and then go to DT. How big is your DT database and how much RAM do you have?

m

I have a separate DT database just for those emails. It’s 5.8GB on disk. I have 10GB of memory (Mac Pro). And there’s 108K emails.

I’m still evaluating whether I want to keep them in EagleFiler or DTPO.

DTPO stores each email as an individual file in the filesystem. The advantage to that is the ability to manage them, move them, and delete them as needed. The downside is that it really slows down all filesystem operations like backups etc. when you have 100,000 tiny little files in a handful of directories.

EF stores the emails in a few big mbox files which are basically read only. The advantages and disadvantages completely swap between the two products.

I also tried MailSteward but I discovered it was discarding duplicate emails that weren’t really duplicates. The developer told me it only compares the first 400 characters of the body (plus the To, From, and Date). It was detecting dupes where the first 400 characters were all HTML junk or some sort of form email with unique information toward the bottom. I asked him to add the option to disable duplicate detection completely and he said he’d consider it.

I suppose this depends on one’s backup workflow. I see the advantages and disadvantages flip-flopped myself, when using Time Machine backups. Adding a few emails to a DT database results in only the new files requiring backup, while adding a few files to a 5+GB mbox file requires that the entire 5+GB be backed up. This has always been the downside of email apps like Entourage and Postbox 1.x (Perhaps Postbox 2.x also-I don’t know if the database changed) that stores the emails in one big database in that they could quickly fill up a Time Machine hard drive. The DEVONthink 1.x database format worked similarly also, and the 2.0 database format to store files individually was a very welcome change for me.

Except that EF doesn’t add new emails to the existing mbox files. Every time you import email it creates a new mbox file for the occasion. Therefore, you’re desire to only back up the new stuff still works. That’s sort of what I meant when I said they were read only. Deleting an email in EF doesn’t actually remove the email from the mbox file, it just sets a flag in the database to ignore that particular email. EF is pretty Time Machine friendly.

I wasn’t trying to imply which approach was best, only that the advantages and disadvantages are 180 degrees from each other on the two products and I haven’t yet decided which one to keep. When I say it slows down backups I’m more referring to SuperDuper and rsync (or ChronoSync) backups that I run.

My DTPO crashes after about 2000 mails while importing from Apple Mail. I’m currently importing in small chunks which takes forever to work around. So the problem still seems to be there.

Are the emails stored on your local machine or on the remote server?

Do you mean if the messages and attachments are stored locally (Mail, advanced account settings)?

Yep. The ideal situation is that the email messages are stored on your local machine, not trying to be imported over the network from the remote server.

The iCloud account is set to store messages and attachments for offline viewing.

Send the crash log along when you start a Support Ticket at our DEVONtechnologies Support Ticket System. Thank you.

Did that before whining here :wink:.