Find Duplicates

Jweber · October 24, 2005, 1:21am

Sometimes when I’ve imported a number of files I forget that I’ve already imported one and get the file name showing up in blue to indicate the dupe. What I’d like to do is find the dupe. The only script I can find is Find and Remove Similar Contents. I don’t know if I want to remove it, I just want to find it! Sometimes I have no idea what group the other one is in. Maybe it would be nice to have just a Find Duplicate script and the group where the dupe is located would be highlighted.

Thanks.

JFiedler · October 24, 2005, 9:24am

Yep, that would be nice, because DT already knows which file is in duplicate.
And for convenience, a splitt screen with the selected dupe should pop up, - including excat path and comments and … - so it would be easy to choose the dupe that could be deleted.

But this topic should be in
devon-technologies.com/phpBB … a2b18fe061
Any Mods around here

Bill_DeVille · October 24, 2005, 4:39pm

For some reason, I ended up with multiple copies of the bookmark for CNN.com. How did I know that? Because the title of my normal bookmark for that site turned blue.

So where’s that pesky duplicate?

Here’s how I found it:

Since it’s a bookmark file, it has 0 bytes. The only ‘content’ is in Name and URL fields in it’s Info panel. So that’s what I should focus on in Tools > Search. I entered the search string “cnn” and Search options URL/Path, All Words, ignore case, any State, any Label, database-wide.

Result: 123 items found in 0.025 seconds. Now, I want to see the duplicate files. What’s my next strategy to do that quickly?

Sort Search results by file size. Duplicates can have different names, but they will have the same file size. That’s the trick. The duplicate CNN bookmark resulted from the fact that I had imported the DT Pro Tutorial database into my main database while doing edits. The Search window, now that I had sorted results by file size, brought the duplicate files into proximity and I could identify the location of the duplicate. In this case, I didn’t delete the duplicate, as I will eventually delete the entire group containing the edited Tutorial database files (after exporting them).

Tip: Note the file size of a document that’s duplicated. Use a search strategy that will show the duplicates in the results list, then sort results by file size and look for your blue documents in that size range.

Maria · October 24, 2005, 10:50pm

Bill,

that is of course a work around, but it should be implemented into DT. I asked for that long ago and as far as I understand it will be solved with 2.0. I do not understand why it was so difficult to implement under the old file system, …

Best,
Maria

Bill_DeVille · October 25, 2005, 12:35am

Hi, Maria:

It’s one of hundreds of things on the “to do” list.

But notice how many things have been added in the recent releases?

Maria · October 25, 2005, 1:34am

Bill,

Maria

highchecker · November 13, 2005, 2:08am

i think quick finding and exclude duplicates is a top topic, would be nice to have that soon

Jweber · December 18, 2008, 9:17pm

I figured I’d tag this on to my original email. Has anything been added to 2.0 to help find Duplicates and Replicants? I can’t seem to find anything (yet).

And along the same theme I thought a simple solution would be a search using the name of a replicant that might bring up all instances of the file. But it doesn’t. I only get one. Why’s that?

cgrunenberg · December 19, 2008, 9:27am

See Data > New > Smart Group… and choose Instance - is/is not - duplicate/replicant

Jweber · December 19, 2008, 12:41pm

For my test I have a car title replicated in 2 separate places in my db. One is in a group Documents/Autos and the other is in a group MINI.

If I set up a Smart Group as you said it (plus using the name Title_MINI) finds only one instance of the Replicant. And the path it shows (pasted below) is its location with the physical files on the hard drive. But showing it’s in files.noindex/pdf doesn’t really help me to see what group it’s in up here in the real world of DT. Is there a way for it to show both instances with each path being the group it’s contained in?

]

cgrunenberg · December 19, 2008, 1:01pm

The search results are correct. Only one item is listed as all replicants are the same whereas searching for duplicates lists one or more items as duplicates are not the same.

Finally, it’s not yet implemented but the final version should support Go > Previous/Next Instance for smart groups too.

Jweber · December 19, 2008, 1:05pm

But what about showing the Group? What would next instance tell me if the result only shows the location on the hard drive?

cgrunenberg · December 19, 2008, 1:07pm

Go > Previous/Next Instance switches between the various locations of a replicant and therefore the same file/group.

Jweber · December 19, 2008, 6:16pm

That doesn’t seem to work. Previous/Next does not bring up the other replicant so I’m confused there. I’m left with just the one result (the original document with the Finder path name). And once again that path name is not associated with any group. It would be nice if searches and smart groups showed the group/path/name as a optional result.

Jweber · May 5, 2009, 2:09pm

I’m still hoping some sort of simple ability to find the various location s of Duplicate and Replicate instances will be added.
I set up a Smart Group for Duplicates which does list multiple instances but not their locations. Even if there were breadcrumbs at the top of the page that would help in the usage. Or a contextual menu item to “Show Location in Group” that could be used for each instance along with the current “Show in FInder”.

Previous Instance is not helpful:

Jweber · May 5, 2009, 2:34pm

OK one workaround for this is to open the Info window and look at the Tags when selecting an item to show location. The only thing I find about this is that the level of sub groups that are listed in the tags is limited. I think this may be just because the actual window the tags appear in is too small to show many tags with names that when added are too long to fill the area.
This is close to breadcrumbs it if worked without the apparent space limitations.

twicks · May 5, 2009, 7:57pm

Breadcrumbs! Yes! We want breadcrumbs! Absolutely must have it! A way to backtrack several documents back simply by clicking the appropriate breadcrumb. I knew there was something vitally important and necessary missing from DTPO.

Tongue only partly planted in cheeck .

Seriously, breadcrumbs would be a nice future feature.

Jweber · May 5, 2009, 9:03pm

Geez I wasn’t saying it would be vital, just a possible answer to the issue of where a file was located. If you can easily see where a file in a Smart Group is nested in the db another way I’m all ears!

ntahall · May 23, 2009, 7:02pm

Hi

There’s another problem with finding duplicates, which is that DT identifies false duplicates.

I have a number of examples where, on inspection, the files are not identical although they are similar.

What methods is DT using to identify duplicates?

Regards
Nick

annard · May 23, 2009, 8:49pm

It uses approximation techniques to find out how similar documents are and then decides whether it is a duplicate or not.
If you think this can be improved, send the documents to support@devon-technologies.com and then we can take a look at it and possibly enhance this process.