Script request: Find similar contents

Hey Gang!

Does anyone have the “Find similar contents” script they can share?

What do you consider „similar“? What kind of content – text, audio, image, video?

Same word count between documents

So
I love you
and
You love John
and
She loves her
are all “similar”?
Interesting.
Anyway, one possible approach (in JavaScript) would be to build an object whose attributes are the word counts and the attribute values are arrays containing the document UUIDs.

const similars = {};
function find_similar(records) {
  records.forEach(r => {
    const key = r.wordCount();
    if (! similars[key]) {
     /* No entry yet for this key: 
        Create an array with this record's UUID as unique element */
      similars[key] = [r.uuid()];
   } else {
     /* There's an entry already for this key:
        Append this records UUID to the list */
     similars[key].push(r.uuid());
  }
  })
  /* output all similar objects: 
     For each key, print the UUIDs if there's more than one */
  Object.keys(similar).forEach( key => {
   if (similar[key].length > 1) console.log(similar[key]);
  })
}

And the default global smart group History, sorting it by word count and adding the Word Count column would be an alternative without scripting if the script is rarely needed or should process only few items.

Thank you @cgrunenberg, appreciate you taking the time to think about this and answer. I’ll have a play :blush:

1 Like