Regular Expressions and Smart Rules v3.5

Indeed, see https://developer.apple.com/documentation/foundation/nsregularexpression?language=objc

Great! Might I suggest adding a reference to this page (not sure if it’s static enough) or ICU in general in the manual? Most regular expressions work, but the dialects can confuse you if you’re not using the right one. Much like natural language :slight_smile:

For those who are using both DT and DTTG, Shortcuts also uses ICU regex.

2 Likes

Let’s say I scan the text using regex and store the match. In my particular case I look for the Dutch word for invoice called “Factuur” and grab the number that comes right after it e.g. the actual invoice number.

Ideally I would want to do something like this:


Here ContractNo is a user defined metadata.

Unfortunately user defined metadata cannot (yet) be set using regex. I hope for the future that this will be possible. For the moment I’m wondering what to change in the script above to capture the regex output and then set my metadata ? Is selection above the output of the regex ?

1 Like

Would a tag with the matched text work for you?

Only text-based custom metadata can currently use placeholders and regex results.

1 Like

I can offer the JavaScript below. It looks for the largest Euro amount in a document. For US-Dollars, you’ll have to adjust

  • the regular expression amount after the commented out section.
  • probably the removal of € sign and other characters in var digits =
  • the name of the meta data field (it’s “Betrag” here)
  • remove the if … else immediately after that, it is necessary only for regions were comma is used as a decimal separator and dot as a thousands separator.
  • However, for the US you might want to consider removing blanks from the string before converting it to a number

The script loops over all selected documents, determines the maximum amount and then asks for user confirmation to set the user meta data field. I added this confirmation because the approach does not work with some discounts: In those cases, the maximum amount in the document might not be the amount to be paid.

(() => {
app = Application('DEVONthink 3');
app.includeStandardAdditions = true;

var sel = app.selection();

/*
var amount_comma = "[\\d.]+(:?,\\d\\d)";
var amount_dot = "\\d+(:?\\.\\d\\d)";
var amount_euro = "(:?\\s*(€|(EUR|Eur)[oO]))?";
var amount = new RegExp("(" + amount_comma + ")|("+ amount_dot + ")" + amount_euro, "g");
*/
var amount = /(([\d.]+(:?,\d\d))|(\d+(:?\.\d\d)))(\s*(€|(EUR|Eur)[oO]?)|(?![\d.]))/g;
sel.forEach(el => {
  var betrag = app.getCustomMetaData({for: "Betrag", from:el});
  var txt = el.plainText();

  if (matches !== null && betrag === null) {
  matches.forEach(euro => {

     var digits = euro.replace(/[€a-z\s]/gi,""); /* remove all letters, euro sign and spaces */
	 if (digits.match(/,\d\d$/)) {
     	 /* decimal separator = comma: remove all dots from number and replace comma with dot */

	    var val = Number(digits.replace(/\./g,"").replace(/,/,"."));
	 } else {
	     
		/* number contains commas, probably US format: remove them */
		
	    var val = Number(digits.replace(/,/g,""));
	 }
	
	 maxAmount = val > maxAmount ? val : maxAmount;
  }); 
  var answer = app.displayDialog("Betrag setzen?", {defaultAnswer: maxAmount + " €",
     buttons: ["Nein", "OK"],
	 defaultButton: "OK"});
  if (answer.buttonReturned === "OK") {
    app.addCustomMetaData(answer.textReturned, {for: "Betrag", to: el});
  }
  }
  });
 })();

Thanks @cgrunenberg. I didn’t realise this. For my immediate purposes it doesn’t mater if it’s a text or numeric quantity. Following up, I created a new custom metadata InvoiceNo and tested using the following smart rule.


The solution is not as elegant as storing in numeric form but it works fine. The potential for auto-processing of documents is definitely there so I encourage further development and the ability to set other types of metadata via regex.

@chrillek Thanks for the industrial strength solution using JS. Also a good option to explore for multiple documents. I like the idea of looking for the largest amount (which should correspond to the total).

1 Like

Thanks. Also an option and I can confirm that it is possible using the combination of the regex with memory parentheses and the Add Tags option.

1 Like

But beware of these pesky discounts… e.g., Deutsche Telekom Mobile bills can not be processed reliably with this script. But that might not concern you :wink:
Since you seem to be working on Dutch or Flemish documents, the euro-specific parts of the script may still be useful for you.
BTW: I think that alpha-numeric invoice numbers are ok. At least I seem to be getting invoices like “WRYYYY####” or “YYYY-##”. Your mileage may differ, of course.