How do i get rid of duplicates and keep the original?

HenkvanEss · January 24, 2014, 10:21am

Is there a way to get rid of duplicates and keep the original? Have a folder with 92 duplicates and don’t want to delete the half of them manually.

korm · January 24, 2014, 11:40am

There is no “original” – duplicates are exact copies of the file that was duplicated.

Use Scripts > Data > Move Duplicates to Trash but verify your results before emptying the Trash.

Mkcmobile · February 8, 2015, 6:05pm

I’m brand new to DT (Pro/Office) and have a similar question. I’m in the process of consolidating from all my scattered sources into DT. I’m finding sometimes I bring in a bunch of files into the Inbox that I have already sorted into various folders in the database previously. I can see there are duplicates in the Duplicates smart folder. What I’d like to do is just trash the ones in the Inbox and keep the others all filed where I have them. What’s the best way to do that? Highlight everything in the Duplicates smart folder and run the script of moving all Duplicates to Trash? I want to make sure I don’t trash the ones that are already sorted or trash ones in the Inbox that aren’t duplicates (and still need to be sorted).

Thanks!

korm · February 9, 2015, 11:13am

Depending on the setting in DEVONhink > Preferences > General > Appearance > Mark duplicates and replicants in color, duplicates either have a name whose color is blue, or they have an icon indicating duplication.

Once you’ve determined an item is a duplicate, you can locate the other duplicates of this document by looking at the “See Also” portion of the “See Also & Classify” drawer (Data > See Also & Classify or the See Also & Classify “hat” icon. In the See Also display, duplicates and replicants have a score of 100 – the score bar is totally green.

or you can open Tools > Show Info and click the “Instances” button to see the path of each duplicate.

After investigating the duplicates, you can decide what to trash, what to move, what to ignore. You could use autmation for this, but then DEVONthink would eventually delete something you want – without your auditing the decision in advance – and you would be disappointed.

Bill_DeVille · February 9, 2015, 7:38pm

Personally, I wouldn’t use an automated procedure to identify and delete duplicates. I’ve got too many examples of duplicates in my databases that are not identical copies, and that hold information that I don’t want to lose by their deletion.

Here are just two examples of cases where important information could be lost by trusting automatic deletion of duplicates:

Scenario 1: A collection of forms resulting from a survey. The contents differ by only a few words, so are identified by DEVONthink as highly similar, i.e., duplicates. I don’t want to throw any of them away.

Scenario 2: I’ve got some important documents scanned and OCRed into a database. Because of blemishes on the original paper copy, there were a number of text recognition errors. To improve search retrieval, I used Data > Convert > to searchable PDF to create rich text files of the text content of the text content of those PDFs, then edited them to correct errors. Now I can reliably search for important terms in those documents. But DEVONthink designates the PDF and its rich text counterpart as duplicates. I don’t want an automated procedure to delete either of those “duplicate” documents.

I don’t have a lot of duplicates in my databases, so I don’t have a storage problem resulting from excessive copies of documents. Once in a while I’ll come across duplicates that I don’t need, and will manually delete one of them. As noted above, I do need some duplicates and don’t want to arbitrarily delete them.

Mkcmobile · February 10, 2015, 3:20am

Thanks very much for the insights and detailed responses! Much to learn here.