Javascript, Regex, Custom Metadata

SlickSlack · November 10, 2024, 8:58pm

Struggling a bit with my paltry Javascripting skills.
I have a few hundred new OCR’d PDFs,(invoices, receipts, expense reports)
I have to extract a minimum of data from them into custom metadata. Amounts, Dates.
They’re all pretty good quality digital documents, not scans.

I think my best bet is to figure out the regex for each piece of metadata from each set of docs and get this job done with a few passes. i.e. the first set I looked at was 57 Uber receipts. I can extract the total using RegEx but can’t get the last step to get it into custom metadata. Script Editor errors out as undefined.
My basic regex is inquiries\.\nCA\$(\d\d\.\d\d) and my script is below.
I don’t know how to change the result of the regex to the right value format. my custom metadata field is called “amount” and it’s set to currency.
Note: this script was cobbled together from my attempt at learning from other scripts on this forum. I am a bit lost at this point and not sure what’s what.
Thanks

function performsmartrule(records) {
const app = Application ("DEVONthink 3");
app. includeStandardAdditions = true;

const RE = /inquiries\.\nCA\$(\d\d\.\d\d)/; 
records.forEach(r => {
const txt = r.plainText();
const extractedAmount = txt.match (RE);
const md = r.customMetaData() || {};
md["amount"] = extractedAmount;
r.customMetaData = md;
})
}


(() => {
  const app = Application.currentApplication();
  if (app.name() !== "DEVONthink 3") {
    performsmartrule(Application("DEVONthink 3").selectedRecords());
  }
})()

chrillek · November 10, 2024, 9:51pm

Try using addCustomMetaData instead of manipulation the property directly.
That does definitely work.
Also, are you sure that your script doesn’t work? As the main function doesn’t return anything, it’s not surprising to see undefined as a result value. An error would result in an error message.

meowky · November 11, 2024, 4:40am

In a smart rule, instead of

records.forEach(r => {...})

be advised to use

records.map(r => app.getRecordWithUuid(r.uuid())).forEach(r => {...})

This reconstruction of the records array circumvents certain bugs associated with peculiarities of the performsmartrule JavaScript function (see this thread for the relevant discussion).

meowky · November 11, 2024, 4:56am

This is the reason your regex didn’t work as expected:

BLUEFROG · November 11, 2024, 6:19am

Why wouldn’t you use…?

const re = /inquiries\.\nCA\$(\d*\.\d{2})/;

or…

const re = /inquiries\.\nCA\$([\d.,]+)/;

…which should cover from $3 to $.25 to $2,432.81 to $4,352,112.

chrillek · November 11, 2024, 6:52am

Interesting. Except from the thread you mentioned, I’ve never had any trouble with using records.forEach on its own. But I’m only using internal scripts in smart rules.