I was wondering if the RSS feed and the Twitter feed importer follow the Duplication identification.
As an example lets say I bring in news from Google using alerts with a particular search string, lets say “Widgets”. Then I am also monitoring the Widgets company via RSS. Lets say the widget company puts out an announcement, and google just captures it and puts it out as well.
What I would like to know if it would be possible or if already done, that the Duplication identification feature as present for documents would work for the RSS feeds as well?
The recognition uses only the indexed contents (or thumbnails in case of images) of the items. If the text is identical in this case (separators, white spaces and case don’t matter), then it’s marked as a duplicate.
What about looking at similar content in Documents. That way I can click on the article and see if there are any similar articles in the RSS feeds.
Ultimately it would make research for things a lot easier if the RSS feeds are able to look at similar content in your documents.
Thanks for the suggestion! Looking up similar contents is unfortunately too slow to automatically perform it in the background. On demand you should be able to view similar contents of course via the See Also & Classify inspector.
As Criss’ already noted the mechanism by whcih duplicates are made, you may want to explore the See Also & Classify Inspector. This will show you documents that appear to have related content.
You can find it under Tools > Inspectors or press Control-S to open it.