I’ve started using rss feeds in DEVONthink more, and am running into the following problem: due to the origins of the feeds, it sometimes happens that multiple feeds will have the same article. Is there a straightforward way to deduplicate the feeds?
Each article has a URL (in the corresponding metadata field in DEVONthink), so it is possible to tell that two items are identical without having to do more complicated content diff’ing. Does anyone already have a method to deduplicate RSS items in DEVONthink based on their URLs? Or if not specifically for RSS items, perhaps a solution for other kinds of items in DEVONthink could be adapted.
This question was asked in 2019 but did not see a useful answer then. Jim Neumann has a blog posting from 2020, but it involves a utility that searches folders outside of DEVONthink, not existing items created in DEVONthink. Another more recent blog posting of Jim’s about how to use RSS in DEVONthink did not address the issue of duplication.
The feeds are hashtag searches in Mastodon. Sometimes the outputs have the same articles because posters put multiple hashtags in their postings, and consequently, sometimes searches for different hashtags will pick up the same article. Two different feeds will thus occasionally have duplicates.
I assumed this was something I’ve had to kind of tolerate in RSS feeds from newspaper sites.
i.e. the Guardian - the same article may show up in more than one feed (world news, UK news)
When large news sites allow you to pick your feeds from a wide selection of general to specific topics you’re going to get some duplication.
Really fast, responsive RSS readers like NetNewWire make this a non-issue for me but I can see how if you’re doing research it would be a pain.