I’m dealing ~40k Duplicates in one database of ~60k files (old archives). I see a “Want to delete duplicates” topic in Automation from 6 years ago. Their situation was similar to mine, they wanted to retain one of each of the duplicates and delete the rest. The posted advice was to consider using the Move Duplicates to Trash script. But there was no followup post to see how that worked for the poster. And I’m running DT4 while the poster was running DT3.
As I understand it, the Duplicates smart group shows all instances of files it considers to be the same. So, if there are 3 “identical” files, Duplicates shows all 3 and it doesn’t try to discriminate which one might be the “master file”, as it has no way of knowing which one would be the preferred one to retain. And I’ve confirmed with Finder, that Duplicates does contain all of the file instances.
I’m not a coder, but when I looked at the Move duplicates to Trash Script I couldn’t tell if the script just deletes all the selected files, or if it somehow leaves one in place and removed the rest. It’s just not practical for me to look through 40k fiies and decide which one I want to keep. I’d be willing to just keep the newest one. Would Move Duplicates to Trash script accomplish this?
- You want to ensure these are true duplicates, you need to enable Files > General > Stricter recognition of duplicates.
- There is no master file in this scenario, except maybe in your mind. The script will retain the last duplicated/imported file and trash the others related to it.
And don’t forget you’ll need to empty the database’s Trash to fully remove them.
Thanks Bluefrog. I enabled Stricter recognition as you recommended. There was a reduction in the number of duplicates. Then I did a test first to see how things worked. I selected 2 duplicates (Same File Name) and ran the Move Duplicates to Trash Script. They dropped off of the Duplicates list. And I emptied the trash and the number of duplicates dropped accordingly. But when I checked Finder, both files were still there. Note that this DT4 Database is an Index of my Archive Folder. I figured I must have done something wrong so I repeated it, this time, I didn’t get the prompt noting that these were copies of files, just delete from the database or delete the files as well. Which I thought curious. Anyway, same result, they dropped off of the Duplicates list, but this time the files didn’t show up in DT trash. and all of the files still exist in Finder? What’s wrong with my workflow here?
I probably need to just get a decent duplicate finder that would allow me to specify the criteria for selection, such as earliest or latest modification date. I have “dupe” for my Mac, but when I ran it for this particular folder, it came up with 8k, not the 30+k shown in DT4 duplicates.
That suggests these files were “indexed” not “imported”. My hunch is the script you ran only handles files “imported” to DEVONthink and takes no action on files outside its control and responsibility. I’m guessing and away from computer so did not look into it, but perhaps you can.