To dedupe the dupes?

arkanjil · January 3, 2026, 2:46am

Hello all; long time user and abuser of DT & DTtG, and I am generally pleased with the current versions. I have one persistent issue from across all versions tho, alas: the duplicate function gives me erratic results. Files that are literally the same file don’t match, while files of widely different sizes / resolutions get matched. It works mostly well enough, but there are times it can be quite vexing.

Any which way, my question is this- there are any number of Mac apps available that specialize in finding duplicates; does anyone have experience with using these third party clients across a DT database? It’s obviously not a healthy thing to do to a database, but I have faith in being able to repair the damage, probably- I thought best to ask around first. Aside from dumping the files out to dedupe and then reloading the database (which may well be equally disruptive?).

Thank you for the most excellent product otherwise, and keep up the good work

BLUEFROG · January 3, 2026, 4:27am

Have you enabled Settings > Files > General > Stricter recognition of duplicates in DEVONthink?
The underlying technologies in DEVONthink and DEVONthink To Go are not the same so duplicate detection will not necessarily match between the two.
No, I would not run some deduping utility on a database or yoou run the real risk of damaging it.
Thanks for the nice feedback and encouragement!

arkanjil · January 3, 2026, 5:37am

Yes, I’ve tried it both strict & not so much. I’ve tried to suss out a pattern as to what matches and what doesn’t without success. It was the biggest thing on my wishlist when the new versions were announced, but it seems to work as it did before. I’ve not seen much difference in results between DT & DTtG for dupes, but I will keep an eye out for such now.

it is a very fine app that you & yours have made, and I value it highly. I do not in any way recommend that anyone else use it as I do, with 1T+ of data & multiple sync stores- i came by my dupes the easy way, by syncing boldly where syncs shouldn’t be done… I will as always take your recommendations to heart.

Pray, is there a way to convince DT to reset and rescan for dupes?

rmschne · January 3, 2026, 6:59am

You’ve not mentioned that you did any database Optimisation, Verify & Repair, or Re-builds. Or Verify database on sync locations or even “clean” to re-do the syncs? Perhaps one or more of these might help? Perhaps try on one of your problematic databases? Full instructions for these troubleshooting steps are in the “Troubleshooting” chapter of the DEVONthink Manual and in Help.

arkanjil · January 3, 2026, 6:36pm

I do the verify & repair regularly. It doesn’t affect dupe counts, as best as I’ve noticed.

I have not done a rebuild in a long time-it would be good to check.

the syncs I verify regularly; I have done some recent connection cleans as well, as a server disk issue caused one database to go wonky, which is how I ended up with a pile of duplicates and pending files. I rebuilt the db from scratch, pulled the missing/pending files from a recent backup, and then started hunting dupes- which cleared several gigs worth, but left a bunch behind-i can manually verify the file is in the group structure, but also is in the suspected dupe group I created, but they don’t get matched. And then there are dupe matches with very different file sizes- images that have reduced sizes in the archive, mainly.

So I will try a rebuild on the base that has ghost dupes. I will also peel out about a quarter of my archives from DT, as DT is not a primary archival application, and after a long search, I’ve worked out better ways hopefully to horde my data, and leave DT for my evil, evil plans.

Thank you for the input.

BLUEFROG · January 3, 2026, 6:49pm

arkanjil · January 3, 2026, 9:00pm

Oh, I meant my small, totally innocuous plans. I have other apps for the evil stuff.

truly I have nothing bad to say about the current state of DT deduping. Given the scope of all else the app does, that it can do deduping of any sort is gratifying. Thank you all again for your work.

arkanjil · January 3, 2026, 11:31pm

And oddly enough, the option I had forgotten about apparently did the trick: running a rebuild on the database with the dupe issue made all but 7 of my suspected dupes match. That was well & good, but the Find Duplicate Smart rule I have in the database said there were now over 2000 more new duplicates found- yeek.

parsing by size and name, they appear to be almost all exact matches found across differing levels of the group structure of the database. At least some are intentional, as in being files I had put in two different places previously, but that hadn’t triggered duplicate detections at the time. Others I am unsure of- some are PDF vs PDF + Text, and a few are dupes hanging in the tag structure instead of the group structure- the database does pass verify with no issues. Using DT’s clever pattern matching, I can chew through the dupes fairly efficiently, tho it’s a bit spooky- this total dupe count is well over 10% of the total database document count.

It will be interesting to see what the other database rebuilds will reveal. Thank you