Duplicate content detection?

Would it be possible to add duplicate detection based on contents, rather than type+size?

It would be nice if this included metadata as well. For example, if I have two files and one has a URL and the other doesn’t, they wouldn’t be considered duplicates.

Duplicate detection doesn’t consider size and type unless you’ve enabled stricter duplicate detection in Preferences > General.

Metadata isn’t part of the content of a file so it wouldn’t factor into a contents-based duplicate detection. Development would have to assess extensions to the detection mechanism.

I must not understand how duplicate detection works then. I have four files, all different, and they are being marked as duplicates.

The contents of each is


and the number at the end varies, depending on the reference.

Check the Instances dropdown in the Info inspector to see where the other duplicates of each file are

I have some little amount of files (PDF) that are completely different inside and they are marked as duplicates if I have disabled “strict” checking. However, I’m interested in this “soft” way to check duplicates because it is able to find very similar files that really are a duplicate that I want to get rid.

My solution is to have a tag called .false_match and then modify the duplicate smart group to ignore dulicates with that tag.

(BTW, I use point-starting-tag to indicate that is a “system” tag and not a normal one).

1 Like

I know where they are. The content is different. That is the issue.

Screen Shot 2021-01-12 at 12.10.03 PM

The rendered content is not different in this case. Both documents contain a single word: bookends. If the source of the document was not showing, it would certainly appear the files have the same content.

Development would have to assess modifying this behavior.

I still think this is ridiculous, and is one of the reasons I stopped using DEVONthink.

We have users who expect this specific behavior, so I wouldn’t call it ridiculous on their behalf.

In the Help > Appendix > Hidden Preferences, click the On link for IndexRawMarkdownSource, then do a File > Rebuild Database.