Deleting 40k Duplicates in One Database

I’m dealing ~40k Duplicates in one database of ~60k files (old archives). I see a “Want to delete duplicates” topic in Automation from 6 years ago. Their situation was similar to mine, they wanted to retain one of each of the duplicates and delete the rest. The posted advice was to consider using the Move Duplicates to Trash script. But there was no followup post to see how that worked for the poster. And I’m running DT4 while the poster was running DT3.

As I understand it, the Duplicates smart group shows all instances of files it considers to be the same. So, if there are 3 “identical” files, Duplicates shows all 3 and it doesn’t try to discriminate which one might be the “master file”, as it has no way of knowing which one would be the preferred one to retain. And I’ve confirmed with Finder, that Duplicates does contain all of the file instances.

I’m not a coder, but when I looked at the Move duplicates to Trash Script I couldn’t tell if the script just deletes all the selected files, or if it somehow leaves one in place and removed the rest. It’s just not practical for me to look through 40k fiies and decide which one I want to keep. I’d be willing to just keep the newest one. Would Move Duplicates to Trash script accomplish this?

  • You want to ensure these are true duplicates, you need to enable Files > General > Stricter recognition of duplicates.
  • There is no master file in this scenario, except maybe in your mind. The script will retain the last duplicated/imported file and trash the others related to it.

And don’t forget you’ll need to empty the database’s Trash to fully remove them.