Regular Expressions and Smart Rules v3.5

Indeed, see NSRegularExpression | Apple Developer Documentation

Great! Might I suggest adding a reference to this page (not sure if it’s static enough) or ICU in general in the manual? Most regular expressions work, but the dialects can confuse you if you’re not using the right one. Much like natural language :slight_smile:

For those who are using both DT and DTTG, Shortcuts also uses ICU regex.

2 Likes

Let’s say I scan the text using regex and store the match. In my particular case I look for the Dutch word for invoice called “Factuur” and grab the number that comes right after it e.g. the actual invoice number.

Ideally I would want to do something like this:


Here ContractNo is a user defined metadata.

Unfortunately user defined metadata cannot (yet) be set using regex. I hope for the future that this will be possible. For the moment I’m wondering what to change in the script above to capture the regex output and then set my metadata ? Is selection above the output of the regex ?

1 Like

Would a tag with the matched text work for you?

Only text-based custom metadata can currently use placeholders and regex results.

1 Like

I can offer the JavaScript below. It looks for the largest Euro amount in a document. For US-Dollars, you’ll have to adjust

  • the regular expression amount after the commented out section.
  • probably the removal of € sign and other characters in var digits =
  • the name of the meta data field (it’s “Betrag” here)
  • remove the if … else immediately after that, it is necessary only for regions were comma is used as a decimal separator and dot as a thousands separator.
  • However, for the US you might want to consider removing blanks from the string before converting it to a number

The script loops over all selected documents, determines the maximum amount and then asks for user confirmation to set the user meta data field. I added this confirmation because the approach does not work with some discounts: In those cases, the maximum amount in the document might not be the amount to be paid.

(() => {
app = Application('DEVONthink 3');
app.includeStandardAdditions = true;

var sel = app.selection();

/*
var amount_comma = "[\\d.]+(:?,\\d\\d)";
var amount_dot = "\\d+(:?\\.\\d\\d)";
var amount_euro = "(:?\\s*(€|(EUR|Eur)[oO]))?";
var amount = new RegExp("(" + amount_comma + ")|("+ amount_dot + ")" + amount_euro, "g");
*/
var amount = /(([\d.]+(:?,\d\d))|(\d+(:?\.\d\d)))(\s*(€|(EUR|Eur)[oO]?)|(?![\d.]))/g;
sel.forEach(el => {
  var betrag = app.getCustomMetaData({for: "Betrag", from:el});
  var txt = el.plainText();

  if (matches !== null && betrag === null) {
  matches.forEach(euro => {

     var digits = euro.replace(/[€a-z\s]/gi,""); /* remove all letters, euro sign and spaces */
	 if (digits.match(/,\d\d$/)) {
     	 /* decimal separator = comma: remove all dots from number and replace comma with dot */

	    var val = Number(digits.replace(/\./g,"").replace(/,/,"."));
	 } else {
	     
		/* number contains commas, probably US format: remove them */
		
	    var val = Number(digits.replace(/,/g,""));
	 }
	
	 maxAmount = val > maxAmount ? val : maxAmount;
  }); 
  var answer = app.displayDialog("Betrag setzen?", {defaultAnswer: maxAmount + " €",
     buttons: ["Nein", "OK"],
	 defaultButton: "OK"});
  if (answer.buttonReturned === "OK") {
    app.addCustomMetaData(answer.textReturned, {for: "Betrag", to: el});
  }
  }
  });
 })();

Thanks @cgrunenberg. I didn’t realise this. For my immediate purposes it doesn’t mater if it’s a text or numeric quantity. Following up, I created a new custom metadata InvoiceNo and tested using the following smart rule.


The solution is not as elegant as storing in numeric form but it works fine. The potential for auto-processing of documents is definitely there so I encourage further development and the ability to set other types of metadata via regex.

@chrillek Thanks for the industrial strength solution using JS. Also a good option to explore for multiple documents. I like the idea of looking for the largest amount (which should correspond to the total).

1 Like

Thanks. Also an option and I can confirm that it is possible using the combination of the regex with memory parentheses and the Add Tags option.

1 Like

But beware of these pesky discounts… e.g., Deutsche Telekom Mobile bills can not be processed reliably with this script. But that might not concern you :wink:
Since you seem to be working on Dutch or Flemish documents, the euro-specific parts of the script may still be useful for you.
BTW: I think that alpha-numeric invoice numbers are ok. At least I seem to be getting invoices like “WRYYYY####” or “YYYY-##”. Your mileage may differ, of course.

Is this still the case? No way to capture a numeric or floating point value using RegEx and put into a numeric custom meta data field? Or Date fields for Date RegEx queries?

The result of Scan Name/Text > Date actions be used for custom metadata dates but changing numeric values that way isn’t possible yet.

Way. Scripting, as often:

function performsmartrule(records) {
  const RE = /\b\d+\.\d\d\b/; /* word boundary, at least one digit, decimal point, exactly two digits, word boundary */
  records.forEach(r => {
    const txt = r.plainText();
    const match = txt.match(RE);
    if (match) {
    /* Do whatever you want with match[0] */
   }
  })
}

Looks interesting, I’m not a Javascript programmer however, can you expand your example to show it going into a custom metadata number or decimal field? And/Or some what to display the result of match[0] in an alert?

Searching for “addCustomMetaData” in the forum should turn up something. If not, let me know.

Yes, but I have figured that parr out. Now I get an error.

with this code:

function performsmartrule(records) {	
	const app = Application("DEVONthink 3");
	app.includeStandardAdditions = true;
 	const RE =  /Payment Amount \$((\d+)\,(\d+.\d+))/; 

/* \b\d+\.\d\d\b/; /* word boundary, at least one digit, decimal point, exactly two digits, word boundary */


  records.forEach(r => {
    const txt = r.plainText();
    const match = txt.match(RE);

    if (match) {
    /* Do whatever you want with match[3] */
	const newNum = parseFloat(match[1].replace(',',''));
	app.displayDialog(newNum);
	r.addCustomMetadata(newNum, {for: "number", to: r});
   }
  })
}

I would suggest to debug the basic functionality first in the Script Editor.app, e.g. to view which line fails.

1 Like

Some detail to @cgrunenberg’s advice on using Script Editor. If you use this construct

function performsmartrule(records) {
…
}

(() => {
  const app = Application.currentApplication();
  if (app.name() !== "DEVONthink 3") {
    performsmartrule(Application("DEVONthink 3").selectedRecords());
  }
})()

you can use the same code in Script Editor and in a smart rule.

As to your regular expression:

  • If your PDFs have been OCRd, you should use a more robust expression, with \s+ instead of .
  • The dot in your expression matches any character, not only a dot. You must escape the dot \. to make it match a dot only.
  • OTOH, escaping the comma is not necessary: \, can be rewritten as ,.
  • As it stands now, your expression only matches amounts > $1,000. If that is not intended, you should make (\d+), optional.
  • The last group of digits should be written as \d\d, as you only want two of them, not an arbitrary number (and most probably not one).
  • You’re overusing capturing groups. As you only want the amount, there’s no point in capturing the thousands nor the rest.

I’d rewrite the RE like this:
/Payment\s+Amount\s+\$((?:\d+,)*\d+(?:\.\d\d)?)
That is: one capturing group for the whole amount. One non-capturing group for the optional thousands part (?:\d+,)* which may appear from 0 to any number of times. Use \s+ (at least one space) instead of (exactly one space). Escape the dot, don’t escape the comma. Exactly two digits following a dot, the whole decimal part being optional.

Note that this is not a failsafe RE to capture amounts. It will also match $10,2, for example. A lot more work would be needed to make that a truly comprehensive RE for amounts.

Thank you and cgrunenberg, I am now using the Script Editor to help debug. I still get the same error regarding number conversion

I believe this is happening on the line that executes the addCustomMetaData because if I take that line out I get an undefined result instead of an error. The field in Devonthnk is a Decimal Number formatted as a Number.

I have no idea why that is not working – it should (and I have similar code where it does). Anyway, this seems to do the trick:

…
const md = r.customMetaData() || {};
md["mdnumber] = newNum;
r.customMetaData = md;

Alternatively, try changing the definition of Number to “Currency” – that’s what I use.

And in my experience, it’s simpler to use console.log("your output here") than app.displayDialog – I rarely see a reason to have a dialog pop up that I then have to acknowlege.

That’s all working fine now. Thank you very much. I am trying to do a similar operation to capture the due date and put it into the DevonThink custom meta Date field. Do you have any tips for this action what I am doing is not working. By not working, I mean the due date does not update on the document in DevonThink; not that I receive any errors.