Duplicates not listed in Get Info or in Inspector

In DT Pro 4.0.2, I do this:

  • Select a document
  • Create a duplicate to be located in another database
  • Navigate to that database to verify that the duplicate document has indeed been created

Then there’s a problem: No duplicates are shown for the document either in the Inspector (Info / Generic / Instances) or, when I right click on the document, under Get Info / Generic / Instances.

Am I missing something? Thanks.

(Maybe not a bug, probably user error)

The recognition of duplicates is limited to the same database.

Ah, OK. Thank you. Is there another quick way to locate a duplicate of a document across all of my open databases?

Here is an approach that uses the content hash of a selected document…

tell application id "DNtp"
	set ch to content hash of (selected record 1)
	contents of (current database) whose content hash is ch
	set theList to {}
	repeat with theDB in (databases)
		set matchingDocs to (contents of theDB whose content hash is ch)
		if matchingDocs is not {} then
			repeat with theRecord in matchingDocs
				copy ((name of theDB) & ": " & (location with name of theRecord) & linefeed) to end of theList
			end repeat
		end if
	end repeat
	return theList as string
end tell

It just outputs a list in Script Editor with the name of the database and the location and name of the matching document.

Thank you, Jim. I’ll have fun experimenting with this. The script works as advertised, but I’m having to jump through a couple of hoops to see the result in the Script Editor description panel. While my boss is not looking, I’ll do some homework and see if I can do a better job implementing the script.

You’re welcome.
And you didn’t say what your intention was, i.e., why you want to know or what to do with the information, if found. Lacking those details, a list seemed sufficient.

The list is helpful. My need is to be able to see if I’ve created a duplicate or duplicates of a given file in another database, and to see where those duplicates are located.

To be more specific, as I read an academic article, often I’ll have a document that I’ve created as an extended note on a specific point made by the article’s author.

Sometimes I want that same document also to appear in a different database where I keep track of possible topics for talks, lectures and such.

As I was going through a list of those “talk” topics today, I had hoped to be able to find the other location of a given document.

I can fall back on inserting a link from a document to its duplicate, and vice versa, but a list is also helpful. I only wish I were able to pull up that list of duplicates more smoothly.

Thank you for your help.

This version generates a Markdown document in the same group as the selected document.
Both the database and the duplicated document are active hyperlinks.

tell application id "DNtp"
	if (selected records) is {} then return
	
	set theList to {}
	
	set {recName, ch} to {name, content hash} of (selected record 1)
	set matchName to (recName & " Duplicates" as string) -- Vary this name here as needed
	set hasMatchDoc to false -- Assume no document with matches exists
	
	repeat with theDB in (databases)
		set matchingDocs to (contents of theDB whose content hash is ch)
		if matchingDocs is not {} then
			
			if not hasMatchDoc then -- Create the match document the first time duplicates are found in a database.
				set docExists to exists child matchName in current group
				if not docExists then
					set matchesDoc to create record with {name:matchName, type:markdown, content:""} in current group
				else
					set matchesDoc to (child matchName in current group)
				end if
				set hasMatchDoc to true
			end if
			
			repeat with theRecord in matchingDocs
				copy "[" & (name of theDB as string) & "](x-devonthink-item://" & (uuid of theDB as string) & "): " & "[" & (location with name of theRecord) & "](" & (reference URL of theRecord) & ")  " & linefeed to end of theList
			end repeat
		end if
	end repeat
	
	update record matchesDoc with text (theList as string) mode replacing -- Update the document with matches' text
	open tab for record matchesDoc -- Open it, as a convenience
end tell

If an existing document listing matches is found in the current group, its content is replaced with the most recent matches so you don’t end up with a bunch of duplicates.

@Charles56: Better? And yes, it’s not pretty but it’s functional :wink:

6 Likes

That’s very slick, Jim. I like it that the results pop up in a Markdown document and that each duplicate document appears as a clickable link. This is what I was looking for. Collect $200 as you pass Go.

1 Like

PS:

Note the first line. It is a match because this is specifically getting the content hash though the name is different. I don’t know how often people would duplicate a document to another database and rename it, but it’s something to be aware of. However, the content hash is a very specific value. One tiny change changes the hash. So if someone renamed the file and added a return to its content, it would no longer match.

3 Likes

This is a very cool little script.
I am curious, is the content hash how duplicates are recognized in Devonthink within a database?
Or is it a combo of that and other factors?

1 Like

Thanks!

Duplicates are first detected by similarity of content. For example, if you convert a Markdown to PDF, they will be detected as duplicates. And it’s how it works in 3.x as well.

But if you enable Files > General > Stricter recognition of duplicates, the file size, file type, and content hash are used. This is how to detect file duplicates.

3 Likes

By the way, this script makes use of a new AppleScript command @cgrunenberg added in 4.0 Copernicus: update record. Inserting (at the beginning), appending (at the end), or completely replacing text in a document is so much easier and powerful now.

And yes, that’s a bit of a teaser as I know you’re not yet on 4 :wink:

4 Likes

Love this idea, @BLUEFROG, might use it if I leave e-mails and their attachments in a separate database, but want to know where else the attachments reside.

I’m presuming this could be amended to place those links in an Annotation to the selected file?

I’m not asking for such a mod, just confirmation it’s possible?

I’m just lazy and away from my computer to check the AppleScript dictionary right now – but given everything I have seen, I’d presume that it is possible.

Sean

Glad it inspires you! :smiling_face:

Yes, it’s possible.

1 Like

I have a (large) number of files that I just imported into a database. I know that many of them are duplicates, but they don’t show up as duplicates. Same filename, same type, same size, same word count. “Stricter recognition of duplicates” setting is off. Using the scripts above shows that they are duplicates.

Is there a way to get DTP to recognize these internally?

The filename doesn’t matter. But are the files and their duplicates located in the same database?

Yes.

What’s the kind of the files?

Mostly PDF+Text files. A variety of apps created them, but the dupes have the same Creator/Producer.

I turned on Word Count, and some of the dupes have different counts, even though they’re the same, but some have the same word counts.