Phil's JXA Tips: Iterating Databases and Records

Ok, I’ve been beating my head against using JXA (that’s “JavaScript for Automation” for those who don’t know) for a while. Something I think should be trivially simple, like say iterating over all the markdown documents in a given database, isn’t trivially simple to arrive at–for me at least. Like so many things, it is trivially simple once you know precisely how to do it. So I’m going to post here from time to time (if I can remember) whenever I’ve got something worth sharing.

Case in point for today: iterating databases and records. I was able to find examples on these forums already about how to use the selected records in the application, and that’s great! I could do a lot of things by selecting a few particular documents in the UI and then kicking off a script. But what if I want to process all the records in a given database? Or all the records in every database? That’s what today’s post is about. So let me start with the former. Take a look at the following code:


var numRecords = 0;
var iterationDepth = 0;

function processRecord(r) {
	console.log("  ".repeat(iterationDepth) + r.name() + " (" + r.type() + ")");
    numRecords++;	    
}

function iterateRecords(records) {
    iterationDepth++;
    for (var i = 0; i < records.length; i++) {
	    processRecord(records[i]);
	    iterateRecords(records[i].children);
	}
	iterationDepth--;
}

(() => {
    const app = Application("DEVONthink 3")
    app.includeStandardAdditions = true;
	const db = app.databases["YOUR-DATABASE-NAME-GOES-HERE"];
	iterateRecords(db.records());
	console.log("Processed a total of " + numRecords + " records");
	return numRecords;
})()

That’s a little framework script that nicely iterates over all the records in a given database and prints them out with a bit of spacing to indicate hierarchy. For those who don’t understand recursion, let me note that each record in a given database may or may not have “children” depending on what kind of thing it is. A non-empty group, for example, will have children whereas a markdown document won’t–at least as I understand it. If you just start looping over the records in a database, you’ll only get the stuff at its root level and not the children, so iterating like this is helpful if you want to process all of the records in a database. You may easily copy/paste this code and then edit the “processRecord” function to do your own thing easily enough.

But what about if you want to process all of your databases like this? Sure, you could change the name of the database in the script and run it once for each of your databases, that would work just fine. I don’t know about you, but I already have dozens of DEVONthink databases and would rather not go through that much hassle. That’s where a script more like the following can be helpful:


var numRecords = 0;
var iterationDepth = 0;

function processRecord(r) {
	console.log("  ".repeat(iterationDepth) + r.name() + " (" + r.type() + ")");
    numRecords ++;	    
}

function iterateRecords(records) {
    iterationDepth++;
    for (var i = 0; i < records.length; i++) {
	    processRecord(records[i]);
	    iterateRecords(records[i].children);
	}
	iterationDepth--;
}

(() => {
    const app = Application("DEVONthink 3")
    app.includeStandardAdditions = true;
	
	var numDatabases = 0;
	app.databases().forEach(db => {
	    var startNumRecords = numRecords;
	    console.log("Processing database " + db.name());
	    iterateRecords(db.records());
		numDatabases++;
		console.log(db.name() + " had " + (numRecords - startNumRecords) + " records");
	})
	
	console.log("Processed " + numDatabases + " databases with a total of " + numRecords + " records");
	return numRecords;
})()

That script iterates databases too and logs totals of records for each individual database processed as well as the total for all. Be advised: that one can take a while to run if you have as much data as I do. I hope that saves somebody else some time. Cheers!

2 Likes

If you only need all the records in a database (without knowing their place in the group hierarchy), you can use its contents property:

(() => {
  const app = Application("DEVONthink 3");
  const db = app.databases['Test'];
  db.contents().filter(c => c.type !== 'group' && c.type !== 'smart group').forEach(c => {
    console.log(`${c.name()} in "${c.location()}"`)
  })
})()

contents contains all records, including groups and smart groups. Those two are first filtered out and then forEach loops over the real documents. That approach will, however, also find documents in the trash. One can filter them out by comparing their locationGroup’s UUID to the database’s trashGroup UUID, for example.

A similar approach can return all records in all databases by including the code from db.contents() in a app.databases().forEach(db => {...})

Your approach works well, of course. I’m just not a big fan of recursion. And I’d suggest to use forEach over for. Especially, if you introduce a function-local variable in your for loop by declaring the control variable as a var:

https://www.reddit.com/r/javascript/comments/a50jte/is_it_best_to_use_var_or_let_in_for_loop/

forEach prevents you from inadvertently introducing function-scoped variables, requires less typing, and it can be chained with other Array methods (like filter, map etc).

Again, thanks for the suggestions. To be clear, I’m not a fan of recursion either; I’m operating largely in the dark here and guessing each step of the way. Is there some documentation reference for DT3’s internal mechanics? I learned just yesterday how to open the script editor dictionary, but I confess I don’t understand how it’s organized or how to use it. And so far I haven’t found much in the way of good reference or tutorial online.

As to for-each versus a for loop, that too is simply because I’m largely operating in the dark here. I was having problems when trying to use for-each loops running into errors about how a type couldn’t be converted–and I have to say the error reporting in the Script Editor is pretty dismal at indicating WHERE the actual problem exists–so I found a bit of code in the forums that worked and stuck with it. I agree, of course, about the preference for for-each for precisely the reasons you mention.

Actually only documents but parents returns all (smart) groups & feeds.

Not the “mechanics”, but the data structures, methods etc. are documented in the scripting dictionary.

I feel your pain. The idea behind the scripting dictionary is great. But the implementation … The terminology is at best outdated, at worst misleading. Let’s have a look at DT’s database “object” (in fact, it’s a class, but as I said: the terminology is misleading).

** Database** Object [inh. Item] : A database (supports the standard ‘close’ command).
ELEMENTS
contains contents, parents, records, smartGroups, tagGroups;
contained by application

That tells you that a database object contains other objects – well, no, those are Arrays (lists in AppleScript parlance) of objects. As can be seen by the plural form. So, a database contains arrays of content, parent, record, smartGroup and tagGroup objects. And a database itself is contained by an application object.

The latter is kind of what you get when call the top-level JXA method Application("DEVONthink 3")
With
app = Application("DEVONThink 3")
you store an Object Specifier (don’t ask me what that really is – I’d loosely translate it as “pointer to an object”) in the variable app. Knowing that this app “contains” databases, you can write:
allDatabases = app.databases;
What you’re doing there is similar to the approach of object-oriented languages: The dot descends into the object to address one of its properties (“Elements” in Apple parlance – they treat single value properties differently from list-valued ones. Terrible.) Now, allDatabases is again an Object Specifier.

We know that this Object Specifier “points” to an Array. In order to work with this underlying structure in JavaScript, we have to “dereference” using parenthesis:
allDatabases().forEach(db => console.log(db.name()) works because allDatabases() gives you a standard JavaScript Array, on which you call the method forEach. The same goes for db.name(): it “dereferences” the name property (!) of a database object, so that it becomes a JavaScript String.

As long as you’re staying in the JXA world, ie inside Apple’s scripting architecture, you do not dereference Object Specifiers. As soon as you need objects in the JavaScript world, you must dereference.

There’s more in-depth information on JXA and sample code on my website.

4 Likes

Very helpful information. I had figured out a bit of that but was confused as to why some accesses needed the usual dot versus the function call syntax.