Using a single smart rule instead of Hazel (maybe)

Some people (me included) use a naming conventions for their files and put them in certain goups depending on sender and/or content. I’m doing that for bank statements, some of the regularly arriving invoices and income statements.

In DT, you’d need one smart rule per target group, since it’s not possible to move a record to a group depending on some criteria. So if you have 5 bank accounts and two companies regularly sending you invoices, you’d need seven separate rules. Which is not good, because they do mostly the same thing: OCR the PDF if necessary, extract the relevant information (date, account no, sender, the like). So I decided to (Java)script the whole thing. You can read about it here

and download the script, As it stands, it will not run in your installation of DT: you’ll have to adjust the database and group names and of course the regular expressions to extract the date and so on. But it might be useful for someone – if only as a starting point.

4 Likes

Holy s**t!!
This makes me feel like you’ve been standing behind me for the last month, looking over my shoulder, listening to me mumble about how I should be able to automate all this, if only I had more scripting skills.
Can’t wait to dig into this and give it a shot. Test database, here I come!

Thanks Thanks Thanks.

Thanks for the script; I look forward to reviewing the details

I’m not a Hazel user, but I use a script to assist me in processing inbox records; structured name, tags, … and movement out of the Inbox

Instead of a smart rule, I trigger my script manually on selected notes

The way the script is written, that’s possible too.

I spent a good chunk of today digging into this script. I am simultaneously learning Javascript for Automation, brushing up on my RegEx, and picking up some technical German. It’s been quite a ride so far.

To make it simple for myself I tried to get one handler to work on one kind of statement. I’ve figured out how to get Amount and some other info from the OCR text. I’m slowly grinding through each bit of the script and have run into a problem with the part where it moves it to the group.
I get an error like so

app.move({record:app.databases.byId(8).contents.byId(5464), to:"Rogers"})
		--> Error -50: Parameter error.

The script is here

I can edit this post if it’s better to include the script in the post.
Thanks

You’re passing a string to the move method in its to parameter, but it is expecting a group there.

Edit after I had breakfast:

In the original script, the target group is saved in the global object uidMap like so

   …
        const groupFound = app.search(`name: ${group} kind:group kind:!tag`, {in: app.databases[db]});
        if (!groupFound || groupFound.length === 0) {
          error = true;
          message += `\nGroup ${group} for uidMap[${key}] does not exist!`;
        } else {
          /* set groupRec entry to group object */
          obj.groupRec = groupFound[0];
        }

That’s happening in checkUidMap(). Further down the line (in processRecord()) the code looks like that

  const group = uidInfo.group;
  /* Move the record to its new location */
  app.move({record: record, to: group})

That’s obviously my mistake: It should be const group= uidInfo.groupRec;

The error stems from a previous version of the script which replaced the string for the group in uidMap with the group itself. That caused some problems when the script ran several times, so I changed the code in checkUidMap() to add a new attribute groupRec. Should’ve used that,too.

My bad.

4 Likes

Though you didn’t ask for it, please allow me to comment on a part of your script, namely

let rawDate   = txt.match(/([A-Za-z]{3})\s(\d+),\s(\d{4})\s1\sof\s/);

First, you shouldn’t use let for anything that does not (read: must not) change. Use const to make sure that you do not inadvertently modify values.
Second, using \s in a regular expression is in many cases risky: You can’t be sure that there’s exactly one space. Especially if the text comes from an OCR run. I’d rather use \s+ if you’re sure that there is always at least one space or even \s* if you can’t be sure if there’s a space at all.
Then, you use
const rawMonth = txt.match(/([A-Za-z]{3})\s(\d+),\s(\d{4})\s1\sof\s/);
why? You have alread used the exact same regular expression for rawDate - there’s no point in running the same search twice. Especially since you’ve captured the month already in the first capturing group for rawDate. Your month is in rawDate[1].

const rawAmount = txt.match(/(\d+[.]\d\d)\nWe will charge the credit/);
While this is syntactically correct, using [.] instead of \. is overkill. A character class with a single element is just this element, and writing it as such makes that clear (ok, maybe not so much in the point of a dot :wink:

3 Likes

Thanks for all that @chrillek
I figured there was a syntax problem somewhere but I couldn’t find it.


As for all the other issues you mentioned: I am piecing together this script from your example and at the same time learning how javascript works. My method is to get it to do one kind of bill first and then build it back up with other handlers and make the detection and extraction of data more useful and less error prone.


For the line

let rawDate   = txt.match(/([A-Za-z]{3})\s(\d+),\s(\d{4})\s1\sof\s/);

I had changed only the regex of that line. It was let and not const in your script.
I’m still learning what everything does so I’ll go back over it and see what all that means.


re: const. rawMonth
In your Telekom handler you extract the month in one subroutine but I haven’t figured out where and how to use it yet. I was trying to get the script to run without errors.


re: [.]
In yours you had had [.,] which must be because of European notation variations. I simplified it but I can see what you mean by keeping it clearer if there’s only one possible decimal notation.


re: \s, \s+, and \s*
Good to know.


Thanks for all your help and guidance.

Is there a JavaScript version of Script Debugger? Something that shows the progression of variables and such as the script runs. I found that very helpful with learning AppleScript.

Is there a JavaScript version of Script Debugger? Something that shows the progression of variables and such as the script runs. I found that very helpful with learning AppleScript.

@chrillek, would you like to deliver the bad news? :wink:

1 Like

It’s been a request for quite some time but unfortunately it seems unlikely that it will ever happen.

1 Like

Ah well. In a “no pain no gain” manner I guess I am going to learn JXA like everyone else; at the metaphorical non-carbon emitting coal face, chipping away, a chunk at a time.

What FUN!

I got it working for one kind of record and have been able to extract the account number and the full billing amount and the sales tax. I’ve learnt a lot about how things are working and how @chrillek was testing his records for variations. Verrrrry educational. There’s probably a few places where I’ve over simplified it to be any use in sorting through real world records but for the moment I am happy with how far I got, thanks to @chrillek’s starting script. I also tried out writing a new subroutine(?) to strip the hyphens from the account number so it would look better once inserted in the name.

The next thing I have to figure out, and I’ve been searching for hints here and on the wider web, is how to get some of the data in to Custom metadata fields. I know the basis for this is
app.addCustomMetaData(rawAmount[1],{for: "amount", to: r});
where rawAmount[1] is the string that equals the amount and amount is the identifier of the custom metadata field in Devonthink.

But when I add that line before the app.mov line I get an error saying
can't find variable: rawAmount
yet rawAmount is right there and I’ve been able to include it in the name in some tests.
Methinks the writing of metadata requires some extra manipulation that I am not able to grok.

Here’s a link to my script.
(I’ve been doing this for less than a week so it’s going to be a bit messy)

1 Like

Thanks for the kind words. I’ll try to address several questions in this single post.

Debugging JXA

That’s a sore point. There is no JXA debugger per se, the guys from LateNight Software abandoned their project afaik. There are, however, some other possibilities

  • check the Apple Events in the Script Editor’s protocol
  • poor man debugging with console.log(). That writes to the Script Editor’s protocol if the script is running there or to the terminal if it’s running in osascript.
  • Put a debugger; line in your script at the place where you want debugging to start. Sounds like magic, but does only start Safari with its web developer tools open at the line following this statement. I guess that you’ll have to turn on Safari’s developer tools for that (Preferences, last tab to the right, “turn on developer menu”).

The latter method is a bit quirky in that it sometimes works as advertised and sometimes not. Also, it does not give you any information about JXA objects. So it helps with standard JS code, but not really with the intricacies of JXA. But it’s better than nothing, I guess

Scope of JS variables … aka

why addCustomMetaData does not work in your script as you think it should.

Which is a good thing, too :wink: Let’s get back to the point where rawAmount is defined:

function handleRogers(key, r, txt) {
...
const rawAmount = txt.match(/(\d+\.\d\d).../);
// and a bit later
resultObj.amount = rawAmount[1];

So, the first line says that rawAmount is a const. Which not only means that its value is not suppsed to change after it has been defined (i.e. after this very line). It also means that rawAmount is “block local”. A block is, loosely spoken, everything limited by curly braces {}. Here, the nearest open curly brace is the one after the function definition for handleRogers. Long story short: rawAmount is known inside this function, but not outside.

But there’s good news, too. The last line quoted above assigns the first element of this local variable rawAmount to the property amount of the object resultObj. Which is the return value of the function handleRogers.

If you look at the function processRecord(), you’ll see that it calls the handler function and stores its result in its own block-local variable result (well, one could think of a better name ;-). It is already used here to build the name of the record:
const name = ["company", "subject", "date" ].map(x => result[x]).join('_');
(one of the more obscure ways to write JavaScript code, and I’m aware that I wrote it).
So, instead of using rawAmount[1] in your call to addCustomMetaData(), you’d use result.amount (or result.[“amount”], if you prefer that notation).

var vs. let vs. const

In the olden days, fortunately long gone, nobody bothered to declare variables in JavaScript. Very much like in Basic, Fortran, Lisp and of course AppleScript, one would just write me = "myself" and some lines later probably me = "you". Or, worse Me = "you". Quite obviously a typo, but one that did go unnoticed until the code went havock.

So var was introduced as means to say “Hey, I have a variable with a certain name that I want to stick to” and use strict; as a means to tell the JavaScript engine that every variable should be declared. So you’d have to write var me ="myself";, and using Me = "you"; would cause the JS engine to sputter about an “Unknown variable ‘Me’”. Just as you saw in your code with rawAmount[1].

But var is global. If you declared a variable inside a function, every other function could see it. And modify it – horribili dictu. Global variables are personae non gratae since ages, so JavaScript introduced let. Which does just the same as var, namely declare a variable with a certain name. But let limits its scope to the current block (think “curly braces”). So code outside of this block has no idea about this variable and can not involuntarily modify it. Only code inside the block can see and consequently modify these variables. Which is good.

So why would one need const? Well, there are a lot of values (some would say most of them) in a program that never change. Or never should change, not even by accident. Therefore, you should use const to declare a variable (which no longer is variable, but well…) that must never change its value. In all other respects, it works like let, i.e. the variable is block-local.

There will be a lot more about that to be found online, including examples. As usual, I suggest the excellent Mozilla Developer Network to get a more detailed explanation.

5 Likes

That’s amazing. You could have just linked to Mozilla Developer Network and left it at that, so I truly appreciate the time you’ve taken to explain these concepts.
With the adjustments I made based on your guidance it did all the things I need it to. So great.

Quick question…
Do I need one of these:

if (rawAmount) {
  resultObj.amount = rawAmount[1];
}

for every metadata entry?
I am guessing this just checks for any value in rawAmount and passes it to the handler object?
I tried it as just
resultObj.amount = rawAmount[1];
and it works. I’m not sure what the if statement is accomplishing here.

I am so close to adding new handlers for all the different bills I get every month. This is so exciting. Last step is moving from test database to real world but that’s a little while away.

That’s a matter of taste. But yes, that’s probably the best approach. In processRecord, you simply check if ("amount" in resultObj) and call addCustomMetaData() if that’s true.

Alternatively, you could use a sub-object of resultObj like so resultObj.metadata.amount =... and in processRecord use a loop going over all properties of this sub-object. This is probably a better approach if you have several custom meta data fields that may or may not used for all records.

for (let md in resultObj.metadata) {
  addCustomMetaData() // use resultObj.metadata[md] to get at the _value_ of the current meta data element
}

Well, if the regular expression could not match anything, rawAmount would be undefined and rawAmount[1] would consequently raise an error. The if works around that – why should the whole rule fail if only a single value could not be determined?

Lots of progress has been made. Now processing four kinds of utility bills with great success. Extracting text to metadata is going well too.The newest handler took less than an hour to get working now that I’ve got some kind of handle on it.

But it turns out my bank provides PDFs with horrendous and unusable OCR. I want to add an OR statement in the check for OCR and force the bank statements to be OCR’d afresh. Searching the web I find that || is supposed to be Javascript for an or so the if should be:

    if (record.wordCount() === 0) || (company = "Bank Name Here") {
      /* No text layer or crappy original OCR, perform OCR */
      ocrRecord = app.ocr(record.path(), { file: record.path(), waitingForReply: true});
      app.delete({record: record, in: record. parents[0]});
    }

but I get a “Unexpected Token” error when I try to save the script.
Is that the only way to do an If/or statement?

Endless thanks

First: an if condition must be enclosed in parenthesis. All of it. That’s what the error is referring to.

Second: I don’t use === because there’s a discount on =. Check the documentation for assignment and comparison.

Not to contradict you, but that sounds a bit ominous. I know of companeis sending PDFs without a text layer (notably German Telekom for their mobile customers), but a company producing a PDF and then OCRing it themselfes? That makes no sense (to me).

Which is not to mean that it doesn’t happen – banks are weird in some ways. E.g., the German branch of ING sends out PDFs where the text layer runs in columns – you get all the dates first, than all the relevant text, and finally all the amounts. Impossible to process automatically.

That part is directly from your script. I found five === in the original script. But I know nothing about the nuances of comparison types because I need to…

Oh yes. I have been slowly absorbing JS and JXA documentation as I go through this process. You know when you’re climbing a mountain and you always seem to be 10 meters from the starting hut, no matter how long you’ve been hiking? That’s me.
I’ll see if I can go three or four days just developing my version of the script without running back here for help.

That’s pretty close to what I got from my bank’s PDF. A tall narrow stream of text. I compared two statements from a year apart and they seemed similarly unusable. Oh well, I get to learn about comparison statements because of all that.

Sorry, I meant to be funny, which apparently did not work out. So:
=== is for comparison without casts. Which means 0 === 0 is true, whereas 0 === '0' is not. While == casts its operands, so that 0 == 0 is as true as 0 == '0'. Therefore you should always use === to compare two values unless you’re absolutely sure that you want the side effects of ==.

So yes, there are of course five occurences of === in my code, and they’re all intentional.

Never use = for comparison. This is the assignment operator, and the result of company = 'Bank' will always be true (also,company will effectively be set to Bank afterwords, regardless of its original value).

Then I doubt that OCRing that again will change anything. I tried with my ING statements, in DT and in Abby and in PDFPen, and they all are absolutely certain that they have to spit out the text columnwise. But I do not really need that text anyway, so I didn’t bother.

1 Like

In my very preliminary experience doing the OCR again did seem to get a better result.
At this stage of development I won’t be chasing down how all the files came to me and in what state they were in.
I generally don’t OCR any documents, or haven’t up to this point so I am assuming the OCR layer I am seeing in DevonThink is how that file was provided and I will modify the script with that in mind. One of the great things about your script is once I got it working with three handlers and processing 7 different kinds of statements/bills/receipts is how little work I have to do to get them filed and “out of mind”. Specifically finding the values and adding that to custom metadata, that part seems like magic, and knowing how it works in the background means I can worry less about things changing with how the files come to me.

I am honoured that you would try out some scripting humour on me, this early in my journey. It obviously went waaaay over my head but I will endeavour to catch up.