How to Locate Duplicates to Delete Them

Hi there,

I’m trying to seek out all my duplicate files in one database, and I thought putting the file name of the duplicate file into search would find duplicate copies of files, but I’m thinking that maybe I renamed some of these files (stuff like jpegs, for example).

I’d like to find them and delete the ones I don’t need, but I can’t figure out from the manual how to do this — and can’t determine whether there is a premade AppleScript to help with this.

Thanks!

Robyn

I still haven’t found a way to easily and reliably search for duplicate (and replicant) database items, which is something I’ve wanted and requested for a long time. Here are some “tips” that may or may not be helpful:

Items with identical content (e.g. created using Data > Duplicate (cmd-D)) will have names displayed in bold and blue. If they haven’t been renamed them they should appear together in the History window which is sometimes easier to scan when sorted by Name instead of Age unless they’ve been renamed but still have the same modification date.

So the general idea is to use the History window to scan for bold-blue duplicates (or red replicants). If you find dups that appear alone it may be possible to use content from them to search for others.

There’s a Data > Find & Remove Similar Contents… script under the application script menu (between the Window and Help menus). I haven’t looked at the source but it seems to do some kind of fuzzy content matching. Maybe that could be a starting point for a script to non-interactively list “real” duplicate items or somehow conveniently identify them.

I’m not sure why File > Database Properties… (opt-cmd-P) counts replicants but excludes duplicates.

Isn’t it because a duplicate isn’t stored but replicants are? E.g., if you have two files (fileA, fileB) that are the same, if you import them both, dt really just imports one and adds a pointer from the second file name to the file. When you create a replicant, it is like a copy that exists separate from it’s clone.

Here’s how the Glossary in the Reference Guide describes replicants:

It doesn’t define duplicates but I think they’re distinctly separate database entries which happen to have identical, but unshared, content. I don’t know how DT keeps track of duplicates so it can bold-blue highlight their names. Because it has that capability I’ve wondered why the duplicate item count isn’t displayed in Database Properties, and also why there isn’t a way to explicitly search for duplicates (and replicants, which are red higlighted and counted in Database Properties).

My hunch (and hope) is that one positive side effect of backend changes for DEVONthink (Pro) 2.x will be to reduce some of the current frontend “voodoo” required to effectively and efficiently manage databases.

Actually, if I’m reading you correctly, it’s kind of the opposite. Replicants are actually separate references to the same file while duplicates are distinct entries that are stored separately. I could be wrong, but this is how I’ve seen them talked about.

Duplicates don’t always have the same name, either, which is why they can be hard to find in searches. For example, I had two graphics files with exactly the same content and two different names, and it made it hard to track them down.

The easiest way to find duplicates is probably to open/display the document and to use “See Also”. The drawer should list all additional duplicates. Then select the unnecessary ones, open the contextual menu and choose “Delete”.

This script might be also useful as it’s replicating all duplicates to a new group called “Duplicates”:


-- Find Duplicates.
-- Created by Christian Grunenberg on Mon Apr 24 2006.
-- Copyright (c) 2006. All rights reserved.

tell application "DEVONthink Pro"
	try
		set theDuplicates to every content of current database whose number of duplicates is greater than 0
		if (count of theDuplicates) is greater than 0 then
			set theGroup to create record with {name:"Duplicates", type:group}
			repeat with theDuplicate in theDuplicates
				replicate record theDuplicate to theGroup
			end repeat
		end if
	on error error_message number error_number
		if the error_number is not -128 then
			try
				display alert "DEVONthink Pro" message error_message as warning
			on error number error_number
				if error_number is -1708 then display dialog error_message buttons {"OK"} default button 1
			end try
		end if
	end try
end tell

That’s limited to the kinds of items that appear in the See Also drawer. URLs (at least) don’t but images do. Robyn’s original post mentions jpegs so this method could be helpful for locating those.

Very useful - thanks!! I’ll use it to help locate which items were converted from replicants to duplicates during the process of importing a fully exported database into a newly-created database. And I want to construct a minimal database as an example to demonstrate how items/counts can differ after that kind of export->import process (like what Tools > Rebuild Database… does).