Validating duplicates : a starting point script

At times, DEVONthink flags a document as a “duplicate” when in fact that document does not appear to have real duplicates. I get this all the time time with Curio documents - over here, DEVONthink considers every Curio document to be a duplicate of every other Curio document. Since I know that I’ve not duplicated any Curio document, I just ignore this quirk.

I have a minor little script that compares the “data” property of every alleged duplicate of a selected record to the “data” property of a the selected record. In other words, if I select document “A” and DEVONthink considers “B” and “C” to be duplicates, then “A” has two duplicates (not including itself). The script looks at the “data” property of “B”, and compares it to "A"s data, and the same for "C"s data. If “data” is equal, then I count the document as a true duplicate, else it is not a true duplicate.

Note: this is a stupid brute force analysis - it will not be true in many cases, but I think it is truthier more often than the alternative 8) .

I offer this script as a kind of stub that interested scripters can extend as desired. For example, rather than displaying a report, you could add a special tag, or label, or comment, to “true duplicates”, and use that fact in a smart group.

I’d be interested in seeing what others do with this concept.

tell application id "com.devon-technologies.thinkpro2"
	set theSelection to selection
	if theSelection is {} then error "Select one item"
	if the number of items in theSelection is not 1 then error "Select only one item"
	repeat with thisItem in theSelection
		if number of duplicates of thisItem is not 0 then
			set myDuplicates to (duplicates of thisItem as list)
			set numDupes to the length of myDuplicates
			set numSameData to 0
			set thisData to the data of thisItem
			set theReport to ""
			repeat with thisDuplicate in myDuplicates
				if the data of thisDuplicate is equal to thisData then set numSameData to numSameData + 1
				set theReport to theReport & (the location of thisDuplicate & the name of thisDuplicate) & return & return
			end repeat
			display dialog "DEVONthink reports there are " & numDupes & " duplicates. Of these, " & numSameData & " have the same data" & return & return & "These are: " & return & return & theReport
		else
			error "DEVONthink is reporting 'no duplicates'"
		end if
	end repeat
end tell


What’s displayed after switching to the alternate text view in this case? Is the text always identical?

No, of course not. The text is always different. The files are never duplicates, as I said.

Could you please send me few examples so that I could check this over here? Thanks in advance!