I did a quick scan for information on how duplicates are identified… but I need to ask a couple of specific questions.
I have a database that contains emails imported from Apple Mail. In that database (expanded) is a Smart Rule that identifies duplicates. I did not create this rule DT3 did.
Does that smart rule point to all of the duplicate copies or does it just contain a link to one of the duplicates?
If I selected all of the documents in the Duplicates Smart Rule, and moved them to the trash and emptied it, would I have deleted all copies of the email, or just the duplicates?
For example this is the problem I am having :
The Duplicates rule returns this set of emails. And they are duplicates.
Smart groups/rules list all duplicates/copies (assuming that there are no additional conditions and the search scope is not limited to a group).
Therefore trashing them would delete all of them. You could either use Scripts > Data > Move Duplicates To Trash… or the smart rule actions Move to Trash or Delete. Both the script and the rule actions ensure that one copy remains.
As usual testing stuff on your own using some test data is highly recommended.
Any thoughts on those two groups of messages? Why some are not duplicates? Why are they considered duplicates when the content is different. Do the rules/scripts you mentioned above use a different algorithm?
Duplicates, as has been discussed many times in the past, aren’t always byte-for-byte copies of documents. Documents that are sufficiently similar can be marked as duplicates.
If you need tighter matches, enable Preferences > General > General > Stricter recognition of duplicates.