Smart Rule: Add Tag that is a string in the document

Hi,

I’m new to DevonThink. Is there a way to add via a smart rule a tag from a string in the document?
I know how to use the smart rules feature, but I don’t know if DT can add such a tag.

In concrete terms, I want to create a smart rule like this:

If content is [year] (I guess I have to enter for year “[0-9][0-9][0-9][0-9]”, so that DT searches the file for a year)
Then add tag [year] (the year [0-9][0-9][0-9][0-9], so that it automatically adds the year DT found in the file)

Thank you!

This is described on page 259 of the DT manual

As Cambrian already said it’s described in the documentation. Which is excellent and a good point of first research. In your case, the smart rule might look like this (untested):

I’d be wary of the Regular Expression in this case, though: It will pick up any 4 digit run in the text. You might want to change it so that is more specific, like ([12]\d{3}) which picks up any 4 digit run starting with 1 or 2.

1 Like

Oh, I guess I searched at the wrong place in the documentation.
Thank you both so much!

The cool thing about regular expressions (at least I think so) is to ‘solve’ your puzzle with the best match, so DT finds that piece of text you actually want.

As @chrillek mentioned, the regex in that example will match any 4 digit number. That might be good enough, if the text you’re searching doesn’t contain year-like numbers except the year you’re looking for (which could be unlikely).

Depending on your situation you might simply start your search with 20, but if the years you’re looking for are historic that obviously won’t work.

In the end, keep in mind DT isn’t aware of the context. It will regard the number 2020 an equally good match whether it’s a year or 2020 paperclips. Though sometimes you can solve that with a leading text like ‘Year’ or some other text relation that’s always occuring. Good luck!

1 Like

The easiest solution is usually the “Add Tags from Document” action which supports multiple entered tags. All tags found in the text of the document are added.

2 Likes

Is it possible to use “add tags from document” with regex, similar to @chrillek’s suggestions above to add all off the text captured by the regex as tags? For example, if I used (\d{4}) as the regex, could I add more than just the first match?

But there is only one match in this case. For more than one, you need more capturing groups. Perhaps you can elaborate?

This action doesn’t support regex, only Scan Name/Text optionally do. Afterwards the result could be used in the Add Tags action via the regex placeholders (\0, \1, \2 etc.)

Thank you, @chrillek and @cgrunenberg. I think both of your answers indicate that I can’t do what I was hoping to do, but I’ll just add an example just-in-case…

Let’s say I am trying to extract all of the four digit numbers (using regex) from every document in a group and add those numbers to the documents they came from. So, document 1 might have four 4-digit numbers, document 2 might have six 4-digit number. Document 3 has a different number of 4-digit numbers, etc. Is there a way to add all of four digit number to each document based on the results of a regex?

„Add“ how – as tags? In metadata? In the name?

But yes, it’s possible with a script. Use matchAll(/\d{4}/g) in JavaScript (or something similar in ASObjC, as AppleScript doesn’t know regular expressions). That’ll give you (after a little massage) the number strings in an array. You could merge that with the tags values to prevent overwriting existing tags.
But be aware that this might lead to tag inflation.

Thank you, @chrillek. Yes, I was referring to adding tags. Thanks for the tip about scripting. I’ll go and try to figure that out. Thanks for your help

  • Is this based on real-world documents?
    • If so, do you have a few docs to share?

Hi @BLUEFROG,

Yes, I do. I’ve sent you a private message.

In JavaScript, something like this (to be used as script in a smart rule):

function performsmartrule(records) {
  records.forEach(r => {
    /* find all occurrences of _isolated_ 4-digit strings, i.e. not part of another string */
     const numbersFound = [...r.plainText().matchAll(/\b\d{4}\b/g)].map(m => m[0]);
    /* remove all duplicates by creating a Set from the Array */
     const numbersUnique = new Set(numbersFound); 
     console.log(`${r.name()}: ${Array.from(numbersUnique)}\n`);
    /* Uncomment next line to set tags */  
     //r.tags.push(Array.from(numbersUnique));
  })
}

Note that this script uses a slightly modified regular expression: \b\d{4}\b finds all isolated 4-digit strings. I.e. it will not find “1234” nor “2345” in “I live in 12345, Sunset Boulevard”.

The handling of the matchAll results requires a little explanation:

  • [… matchAll(…)] returns an Array of Array. The overall match (i.e. the \d{4}) is contained in the first element (index 0) of every inner Array.
  • map(m => m[0]) creates a new Array which contains these initial elements of every inner Array.
  • So, numbersFound is an Array containing all the isolated 4-digit strings found, including possible duplicates.

The (for me) interesting part is the following conversion from an Array to a Set, which removes all duplicates (because elements of a set must be unique). No programming needed here, the JavaScript runtime takes care of that.
I’ve commented out the line setting the tags because I didn’t want to remove them manually in testing. Just remove the leading// to activate this.

Hi @chrillek,

Thanks for this. Unfortunately, I couldn’t get it to work I get `on performSmartRule Error: Error: Can’t convert types.)

Thank you for taking the time to experiment with this. In the meantime I’ve found another work-around. I truly appreciate you taking the time to try to work through this.

Best wishes

It works ok here, so a bit more context would help - for example, a screenshot of your rule.