Smart Rule: Add Tag that is a string in the document

dmime · July 26, 2020, 11:02am

Hi,

I’m new to DevonThink. Is there a way to add via a smart rule a tag from a string in the document?
I know how to use the smart rules feature, but I don’t know if DT can add such a tag.

In concrete terms, I want to create a smart rule like this:

If content is [year] (I guess I have to enter for year “[0-9][0-9][0-9][0-9]”, so that DT searches the file for a year)
Then add tag [year] (the year [0-9][0-9][0-9][0-9], so that it automatically adds the year DT found in the file)

Thank you!

anon6914418 · July 26, 2020, 11:22am

This is described on page 259 of the DT manual

chrillek · July 26, 2020, 11:28am

As Cambrian already said it’s described in the documentation. Which is excellent and a good point of first research. In your case, the smart rule might look like this (untested):

I’d be wary of the Regular Expression in this case, though: It will pick up any 4 digit run in the text. You might want to change it so that is more specific, like ([12]\d{3}) which picks up any 4 digit run starting with 1 or 2.

dmime · July 26, 2020, 11:45am

Oh, I guess I searched at the wrong place in the documentation.
Thank you both so much!

anon6914418 · July 26, 2020, 2:02pm

The cool thing about regular expressions (at least I think so) is to ‘solve’ your puzzle with the best match, so DT finds that piece of text you actually want.

As @chrillek mentioned, the regex in that example will match any 4 digit number. That might be good enough, if the text you’re searching doesn’t contain year-like numbers except the year you’re looking for (which could be unlikely).

Depending on your situation you might simply start your search with 20, but if the years you’re looking for are historic that obviously won’t work.

In the end, keep in mind DT isn’t aware of the context. It will regard the number 2020 an equally good match whether it’s a year or 2020 paperclips. Though sometimes you can solve that with a leading text like ‘Year’ or some other text relation that’s always occuring. Good luck!

cgrunenberg · July 27, 2020, 8:09am

The easiest solution is usually the “Add Tags from Document” action which supports multiple entered tags. All tags found in the text of the document are added.

bangersandmash · July 24, 2023, 10:21am

Is it possible to use “add tags from document” with regex, similar to @chrillek’s suggestions above to add all off the text captured by the regex as tags? For example, if I used (\d{4}) as the regex, could I add more than just the first match?

chrillek · July 24, 2023, 10:30am

But there is only one match in this case. For more than one, you need more capturing groups. Perhaps you can elaborate?

cgrunenberg · July 24, 2023, 10:54am

This action doesn’t support regex, only Scan Name/Text optionally do. Afterwards the result could be used in the Add Tags action via the regex placeholders (\0, \1, \2 etc.)

bangersandmash · July 24, 2023, 11:18am

Thank you, @chrillek and @cgrunenberg. I think both of your answers indicate that I can’t do what I was hoping to do, but I’ll just add an example just-in-case…

Let’s say I am trying to extract all of the four digit numbers (using regex) from every document in a group and add those numbers to the documents they came from. So, document 1 might have four 4-digit numbers, document 2 might have six 4-digit number. Document 3 has a different number of 4-digit numbers, etc. Is there a way to add all of four digit number to each document based on the results of a regex?

chrillek · July 24, 2023, 11:34am

„Add“ how – as tags? In metadata? In the name?

But yes, it’s possible with a script. Use matchAll(/\d{4}/g) in JavaScript (or something similar in ASObjC, as AppleScript doesn’t know regular expressions). That’ll give you (after a little massage) the number strings in an array. You could merge that with the tags values to prevent overwriting existing tags.
But be aware that this might lead to tag inflation.

bangersandmash · July 24, 2023, 11:43am

Thank you, @chrillek. Yes, I was referring to adding tags. Thanks for the tip about scripting. I’ll go and try to figure that out. Thanks for your help

BLUEFROG · July 24, 2023, 1:36pm

Is this based on real-world documents?
- If so, do you have a few docs to share?

bangersandmash · July 24, 2023, 2:08pm

Hi @BLUEFROG,

Yes, I do. I’ve sent you a private message.

chrillek · July 24, 2023, 3:09pm

In JavaScript, something like this (to be used as script in a smart rule):

function performsmartrule(records) {
  records.forEach(r => {
    /* find all occurrences of _isolated_ 4-digit strings, i.e. not part of another string */
     const numbersFound = [...r.plainText().matchAll(/\b\d{4}\b/g)].map(m => m[0]);
    /* remove all duplicates by creating a Set from the Array */
     const numbersUnique = new Set(numbersFound); 
     console.log(`${r.name()}: ${Array.from(numbersUnique)}\n`);
    /* Uncomment next line to set tags */  
     //r.tags.push(Array.from(numbersUnique));
  })
}

Note that this script uses a slightly modified regular expression: \b\d{4}\b finds all isolated 4-digit strings. I.e. it will not find “1234” nor “2345” in “I live in 12345, Sunset Boulevard”.

The handling of the matchAll results requires a little explanation:

[… matchAll(…)] returns an Array of Array. The overall match (i.e. the \d{4}) is contained in the first element (index 0) of every inner Array.
map(m => m[0]) creates a new Array which contains these initial elements of every inner Array.
So, numbersFound is an Array containing all the isolated 4-digit strings found, including possible duplicates.

The (for me) interesting part is the following conversion from an Array to a Set, which removes all duplicates (because elements of a set must be unique). No programming needed here, the JavaScript runtime takes care of that.
I’ve commented out the line setting the tags because I didn’t want to remove them manually in testing. Just remove the leading// to activate this.

bangersandmash · July 25, 2023, 12:12am

Hi @chrillek,

Thanks for this. Unfortunately, I couldn’t get it to work I get `on performSmartRule Error: Error: Can’t convert types.)

Thank you for taking the time to experiment with this. In the meantime I’ve found another work-around. I truly appreciate you taking the time to try to work through this.

Best wishes

chrillek · July 25, 2023, 5:50am

It works ok here, so a bit more context would help - for example, a screenshot of your rule.