RSS Feeds not identifying duplicates


I know that other users have identified similar issues with the strict criteria of duplication, but I wasn’t able to find anything to address my specific issue, so here goes. I signed up for a number of RSS feeds for the New Books Network, a group that publishes podcasts on various books. Many of these feeds are overlapping, so I’ve ended up with quite a few entries that are ostensibly identical, but have not been identified as duplicates. Going through these manually would be exceedingly tedious, since there are almost 7000 items in all of the feeds. I would like to do one of two things:

  1. Get Devonthink to identify identical entries as duplicates. I’m not sure what’s going wrong here. DT lists the file sizes of these entries as slightly different, but the content looks identical, and the names are identical. See example here: (although in this example, the file sizes are identical)

  1. Identify identical names. I know that this feature has been unavailable in the past. Has anyone identified a workaround?


  • Are these detected as duplicates with Preferences > General > Strictre recognition of duplicates enabled?
  • DEVONthink does not detect names as duplicates.

I didn’t have the stricter recognition setting on before. Turning it on makes no difference.

Is the URL identical too? The content might indeed be slightly different if the size isn’t the same and that’s what basically matters.

No, the urls are not always the same.

If these files are indeed slightly different, is there any other way to identify near-duplicates?

Scripts > Data > Find & Remove Similar Contents… could be used but that requires user interaction. Or you could adapt this script for smart rules.