Scan Text, Regular Expressions, and Custom Date Metadata

Is there any way to use a Scan Text regular expression result in a custom metadata date field?

I was trying to enter a statement ending date in a custom field. At first, I thought I couldn’t do it using a Scan Text action with Date selected, because the opening date comes before the ending date, like “Opening/Closing Date 10/14/24 - 11/13/24”. I didn’t think I could come up with a prefix and suffix that would correctly identify the second date.

Ultimately, before I finished composing this question, I realized I could use " - *" follow by a line return for the Scan Text > Date, so I’ve solved it this time, but it isn’t the first time that I’ve been wondering about using a regular expression for a custom metadata date, so I still want to ask the question for future reference and to improve my understanding of the Scan Text action.

I must say that Scan Text seems like a really powerful feature that I am only just starting to get the hang of, but it also gives me some powerful headaches! I can’t easily wrap my brain around the workflow of it. Perhaps I’ve become spoiled by Hazel’s match token system, but I really wish Scan Text was easier to use. Particularly since you can only pull out one piece of information per action, as far as I can tell. (With the possible exception of regular expressions? Can you grab multiple capture groups from one action with that? It still wouldn’t help when it comes to date fields, though, which has so far been most of my uses for the Scan Text feature.)

There has been at least one related thread here recently. And IIRC, only custom meta data of type text can be set in a smart rule directly. For dates, you need a script. And that, I’m confident, has also been demonstrated here before.

Afaik: yes.

I don’t understand this, because I just made a number of smart rules using a Scan Text as Date action followed by a “Change <custom date field name” action to set the date to “Document Date”. They are working fine, except for one that I’m still grappling with that is setting a date, but not the one that should have been found by the prefix and suffix text.

No, there isn’t anything like a Scan Metadata command.

And it’s unclear what you specifically want to do with RegEx, but yes you can e.g, capture multiple strings.

I think it’s just that I’m relatively familiar with regex or a matching system like Hazel’s, and so my brain thinks along those lines when I’m trying to figure out how to pick out certain data uniquely from document contents. The Scan Text matching system is different in some ways that at times I need to shift to a different mindset to work out how to do it in Scan Text without a regex.

For instance, as in the example I gave for the text contents of a statement I’m looking at: “Opening/Closing Date 10/14/24 - 11/13/24”, my first thought process is something along the lines of “look for the string ‘Opening/Closing Date’, followed by a date, followed by the string ’ - ', then grab the date after that”. But because Scan Text only allows one wildcard (aside from the regular expression option), I can’t grab just that second date while using “Opening/Closing Date” as part of the prefix. So I have to start with the less precise " - " as the prefix and follow it with a line return and maybe some text from the beginning of the next line, if it’s consistent, as my suffix and hope that is unique enough to capture what I want.

At least, this is how it works as far as I’ve been able to tell.

For the record, I figured out my problem with this. My Scan Text prefix/suffix weren’t quite right, so no result was found, which makes the Document Date placeholder return whatever it usually would by default, i.e. another date in the document (first date?).

I used a Scan Text with the String option to test it out until I found the right prefix/suffix.

I realize that. I just wish there was!

So, why not use a RE then? IMO they are more versatile than the simple wildcard stuff.

I’m sorry for the delayed response, but for the record, the answer to this is:

Because I’m working with dates that I want to put into custom date fields. As far as I’ve been able to tell, there’s no way to insert the results of a regular expression scan text operation into a date field, because I think it’s just a string. There’s no way to convert it or have it recognized as a date, and no way to insert the $1, etc token into a date field. And if I don’t put it into a date field, I can’t use it the way I need to use it.

It’s trivial to do with a script that the smart rule executes.

For example (in JavaScript)

function performsmartrule(records) {
  const app = Application("DEVONthink 3");
  records.forEach(r => {
    const txt = r.plainText();
    const match = txt.match(/\d\d\/\d\d\/\d\d\s+-\s+(\d\d)\/(\d\d)\/(\d\d)/);
    if (match) {
      const month = match[1];
      const day = match[2];
      const year = match[3] + 2000;
      const date = new Date([year, day, month].join('-'));
      app.addCustomMetaData(date, {for: 'customDateField', to: r});
    }
  })
}

to set a custom metadata field ‘customDateField’. The RE is taken from your previous post.

2 Likes