Is it possible to disable duplicate detection on import?

I’m trying to import an Outlook 2011 Mail archive, and there are many messages that are being flagged as duplicate/previously imported, and thus DTPO is not importing them. I haven’t been able to identify which messages it things are duplicates/previously imported, so I have no way to validate/trust this is correct (at least as far as I consider a message a duplicate). I’ve tried turning on the option “Previously imported will become replicants” in the Mail section of the DTPO preferences, but that doesn’t seem to have any impact.

Is there any way to (temporarily) disable the duplicate detection so that I can let DTPO import all the mail items, and then perhaps re-enable it later to review what it things are duplicates and process them as necessary.

I’m in the process of trying to bring over a fairly extensive email archive (150k messages) and I would really like to know that I’m not losing anything as it’s brought over, even if it means I have some duplicates to deal with after the fact.

Thanks,
Ed

Have you unchecked “Hide > Imported”?

Yes - I’ve had to do that when trying to re-import a second time. As far as I can tell, that just controls which emails are displayed in the import dialog. Should it do something else?

This is not possible – but it is possible to re-import mail by deleting the previous import or making a new import into a new database.

“flagged” where - in the Import panel or in your database? If they are “flagged as duplicate” then why is it not possible to “identify which messages it things [sic] are duplicates”? Sorry - I’m confused.

I’ve tried this (both deleting everything that was imported, and deleting the whole database to start from scratch). Each time, the same number of messages are flagged as duplicates/previously imported.

They are flagged in DEVONthink’s log window. The log window only displays a count for the number of messages that it believes were previously imported or are duplicates.

For example, I have a folder of build notifications. There are currently 2236 items in this folder. When I use DEVONthink’s mail import to bring in this folder, it ends up importing 2012 items into the DEVONthink database, and the DEVONthink Log window has an entry stating that 224 items are duplicates or were previously imported. From this log message, I can’t see any way to identify which messages DEVONthink believes are duplicates of which other messages.

I wouldn’t be surprised to have some duplicates, but 10% seems high, which makes me skeptical, and based on other threads about “duplicate detection”, it leads me to believe it’s treating similar messages as duplicates/previously imported and skipping them.

There isn’t – other than a brute-force counting and comparison routine, or perhaps a script could be written that would be more informative.

Never say never, but In years of mail importing, I’ve never seen a duplicate.

Are you importing from an Apple Mail or Outlook mailbox, or are you importing from an Unix Mbox? You used the term “folder” for the source, so I wondered.

Outlook 2011. I’ve just moved from a Windows machine to a Mac. I’ve migrated email over into Outlook, and am now trying to move that archive out of Outlook and into DEVONthink.

I may need to look at writing up some sort of script to analyze the data and see if I really have 10% duplicates in this folder. I’ve tried importing another folder, and all messages were imported, so perhaps it is an issue with this folder only. I anticipate running into some challenges as I have one archive folder with 100k+ messages that I think I’m going to need to split up in order to bring them over.

Side note - for the second folder that worked when importing, I first tried the option to “Archive Mailbox”. This results in roughly 2000 of 8000 emails coming over (the rest being marked as “previously imported”). I deleted everything and tried again, this time choosing Import, and everything was brought over. I found this interesting and it wasn’t clear to me why this behavior would be different.

I’m pretty sure you need to empty the trash in order to clear the “already imported” status. Perhaps in the first case the trash was not emptied before the import?

I believe that was true (not emptying the trash) in one of the earlier cases, but I had repeated the process several times, some deleting and emptying the trash, some deleting the whole database and starting fresh.

I think at this point, my best option is to try importing some other folders and do some analysis on the folder that is currently having issues being completely imported.

I just tried another folder with 1152 items (a folder of sent email). When importing to DTPO, only 42 items were imported. The DTPO log window indicated 1110 were previously imported. This was the first time trying to import this folder into DTPO.