Script for deleting duplicates in a recently added folder

I am consolidating hundreds of thousands of files into a dozen or so DT2 databases, and after years of backups and saving to multiple drives, there are tons of duplicates.

So let’s say I had imported previously the file “FOO” into a DT database, and that now I am about to bring a finder folder from a backup disk into the same database, and it happens to contain a copy of “FOO”.

I want to be able to know if “FOO” was already there - and DT is better than anything else to get rid of these, thanks to the very neat ability to color duplicates blue.

One provlem, though: If the newly imported folder has hundreds, if not thousands of blue files, then deleting them is a royal pain. But there is a better way. Use my script (at bottom)

BEFORE YOU PROCEED!

Make sure you know what you are doing. This script doesn’t offer any protection against mistakes. Use with caution.

Add all the FINDER folders you want to an “EXAMINE” folder inside the INBOX of your chosen DT2 database.

Don’t delete the finder folders yet.

In DT select each of the new folders under “EXAMINE” - and run the script.

It won’t take too long to delete every file that is blue. What remains theoretically had no duplicates in the database.

Double check the remaining files, and if you are confident of what resulted (I have been) now you can delete the original finder folders.

I triple check everything, because I found out that sometimes DT2 doesn’t color a file blue when it should (has been happening with some gifs, pngs - but not always). Still, when a DT2 file is blue, you can be sure it is a duplicate.

SCRIPT

tell application “DEVONthink Pro”
set theList to {}
set theDuplicates to {}
set theList to children of current group
repeat with y in theList
if (number of duplicates of y is not 0) then
delete record y
end if
end repeat
end tell

enjoy,

mike

Here’s a similar script which will be part of the next beta. This script moves duplicates from the selection (e.g. results of a smart group to find duplicates) to the trash.


-- Move duplicates from selection to trash
-- Created by Christian Grunenberg on Fri Feb 20 2009.
-- Copyright (c) 2009. All rights reserved.

tell application id "com.devon-technologies.thinkpro2"
	activate
	try
		set this_selection to the selection
		if this_selection is {} then error "Please select some contents."
		repeat with this_record in this_selection
			if number of duplicates of this_record > 0 then move record this_record to trash group of current database
		end repeat
	on error error_message number error_number
		if the error_number is not -128 then display alert "DEVONthink Pro" message error_message as warning
	end try
end tell

Tx Christian - yours is better :wink:

mike

Thanks a lot for this great script. I understand same principle is applied in DT3.
My QQ is the following: If I have two folders with 2 files (duplicates) and want to preserve a file in a -specific- folder, is there a way to make this?
My understanding of the script is that it will delete the most recent file.

The script processes only the selected items, therefore selecting only the items in one group should work as desired.