Most efficient way to get all nested tags (JXA)

mdbraber · March 21, 2024, 3:54pm

Paging @chrillek and other JXA experts. What’s the most efficient way to get all the (nested) tags in the correct order (based on their nested properties, so similar to the order in the sidebar) in an array with objects that like {id: "apps/pkm/devonthink", name: "devonthink"}

I’ve created the following recursive script but I’m wondering if there’s a more efficient / optimized way (to always lower time spent when executing scripts). Thanks!

const app = Application("DEVONthink 3");
app.includeStandardAdditions = true;
const db = app.databases['Resources'];

function getChildren(item, tags) {
	item.children.whose({_match: [ObjectSpecifier().type, "group"]})().forEach(c => {
		tags.push(`${c.location().replace('/Tags/','')}${c.name()}`);
		if (c.children.whose({_match: [ObjectSpecifier().type, "group"]})().length > 0) {
			getChildren(c, tags)
		}		
	})
	return tags;
}

tags = getChildren(db.tagsGroup, []);
tags_list = tags.sort().map(tag => { return {id: tag, name: tag.split("/").pop()} })

I thought about away to not need sort() afterwards, but as items are unordered I need to sort at least once.

BLUEFROG · March 21, 2024, 3:57pm

Out of academic curiousity or do you have a real use case?

mdbraber · March 21, 2024, 4:01pm

No I definitely have a real use case I’m using this script to create “implicit tags” in my annotation documents, because I don’t want to use nested tags (e.g. like Obsidian does). DEVONthink does this already (which is great!), e.g. when tagging devonthink it automatically adds apps and pkm - which is the behavior I also want in other places.

Therefore I create a list of all tags in DT and process those through a script so I can input devonthink and it will output ["apps","pkm"]

This way DT is always my single source of truth for tags and hierarchy and I can dynamically get the current list and order of tags. As I’m running the script quite often (e.g. on saving a Markdown document in Obsidian) I’m looking if I can find the fastest way to create the nested tag list.

BLUEFROG · March 21, 2024, 4:05pm

So you are using nested tags in DEVONthink?

and it will output ["apps","pkm"]

Is this some Obsidian convention?

mdbraber · March 21, 2024, 4:10pm

Yes I am - but other applications (like Obsidian) define nested tags as #apps/pkm/devonthink which doesn’t work (e.g. because it’s considered just a single tag in Finder). So I just use ‘flat tags’ ordered via DT. So a document is tagged “apps”, “pkm” and “devonthink” (3 separate tags) but DT knows there’s actually an order for those tags.

So yes I used “nested tags” in the DT-way of putting it (creating a hierarchy known to DT, but using individual tags for each document). That way works great and is my preferred method.

When I tag a document in Obsidian, I want also to only tag it with devonthink and have it automatically add the (implied) tags apps and pkm. That’s what I have created already and it works 100% as wanted. Now I’m looking if I can optimize some things, like the most efficient way to get tags.

No, this is just my notation to indicate an array (which I process further via Javascript and ultimately outputs Markdown)

cgrunenberg · March 21, 2024, 4:18pm

The easiest ways to find them are…

tell application id "DNtp" to return parents of current database whose tag type is ordinary tag and tags is not {}

…or…

tell application id "DNtp" to return search "kind:ordinary tag item:tagged" in (root of current database)

Needs sorting by location though.

BLUEFROG · March 21, 2024, 4:35pm

This seems like a lot of excess work. Why don’t you simply create a text replacement, e.g., ::dtpk resolves to #apps/pkm/devonthink ?

mdbraber · March 21, 2024, 5:02pm

Thanks @cgrunenberg for pointing this out. I’m struggling a bit with the right JXA (I do all of this in JXA because other parts of my code or plain JS integrating with this). I think it needs to look something like this:

const app = Application("DEVONthink 3");
const db = app.databases['Resources'];

let tags = db.parents.whose({_and: [
	{_match: [ObjectSpecifier().tagType, "ordinary tag"] },
	{_not: [{tags: "" }] }
]})();

tags = tags.map(x => x.location().replace("^/Tags")+x.name())
console.log(tags)

But when I map through the location() and name() it seems it’s querying each individual object again (at least I think it’s doing that if I look at the Script Editor console). Which is probably making it less efficient again.

I tried writing this:

let tags = db.parents.whose({_and: [
	{_match: [ObjectSpecifier().tagType, "ordinary tag"] },
	{_not: [{tags: "" }] }
]}).name();

… which is blazing fast, but as I need the ‘full location’ (location+name) I can’t use that. I’ve created this (somewhat ugly) workaround which leaves most of the work to JS rather than DT - it is blazing fast though.

const app = Application("DEVONthink 3");
const db = app.databases['Resources'];

let tag_locations = db.parents.whose({_and: [
	{_match: [ObjectSpecifier().tagType, "ordinary tag"] },
	{_not: [{tags: "" }] }
]}).location();

let tag_names = db.parents.whose({_and: [
	{_match: [ObjectSpecifier().tagType, "ordinary tag"] },
	{_not: [{tags: "" }] }
]}).name();

tags = tag_locations.map((x,i) => x.replace("/Tags/","")+tag_names[i]);
tags_list = tags.sort().map(tag => { return {id: tag, name: tag.split("/").pop()} })
console.log(tags_list)

Is there a (theoretical) these two queries would deliver items in a different order?

Thanks for thinking along! If there are additional / better ways I’m definitely interested!

EDIT: thanks @cgrunenberg, this is already 6-10x faster!

% time osascript -l JavaScript get-tags.scpt
osascript -l JavaScript get-tags.scpt  0.60s user 0.17s system 41% cpu 1.857 total
% time osascript -l JavaScript get-tags-optimized.scpt
osascript -l JavaScript get-tags-optimized.scpt  0.06s user 0.02s system 28% cpu 0.290 total

cgrunenberg · March 21, 2024, 5:17pm

I’m not sure about the internals of AppleScript/JXA but at least DEVONthink’s part should return the same order. Another possibility might be to query the properties instead of name and location but I didn’t check whether this would be indeed faster (on the one hand it’s just one query, on the other hand it returns more data)

mdbraber · March 21, 2024, 5:33pm

Thanks. I wrote a quick version to query properties() but it’s around 3x slower than the version above.

% time osascript -l JavaScript get-tags-optimized.scpt
osascript -l JavaScript get-tags-optimized.scpt  0.05s user 0.02s system 32% cpu 0.228 total
% time osascript -l JavaScript get-tags-optimized2.scpt
osascript -l JavaScript get-tags-optimized2.scpt  0.38s user 0.02s system 65% cpu 0.616 total

So far it seems that the queries for just single properties work fastest

chrillek · March 21, 2024, 5:58pm

I would not overdo the whose. It’s a bit clumsy, and JavaScript’s filter looks cleaner to me.

What about this:

(() => {
  const app = Application("DEVONthink 3");
  const db = app.databases['YOURDATABASE'];
  const tagLocations = db.parents()
      .filter(p => p.tagType() === 'ordinary tag' && p.tags().length )
      .map(t => { 
	   const n = t.name();
	   return {tag: n, location: t.location().replace('/Tags/','') + n}});

  const tags_list = tagLocations.sort((a,b) => a.tag > b.tag ? 1 : (a.tag < b.tag ? -1 : 0));
  console.log(tags_list.map(t => `${t.tag}: ${t.location}`).join('\n'));
  })()

I seems to do what you want, though I’m not 100 percent sure of that.
Main differences

Only one run over db.parents()
Doesn’t use whose
Builds only one array tagLocations, consisting of {tag: tagName, location: tagLocation} objects.

I’m not quite sure what your last sort().map() sequence is doing (apart from the sort, that is). And I didn’t benchmark that – I have only about 34 tags in the DB I could use for testing.

mdbraber · March 21, 2024, 8:05pm

Thanks for chiming in @chrillek! Totally agree the whose statements are quite clumsy. But they seem the most efficient so far. I’ve re-written your version a bit to check the difference, but the whose version is 3 times faster it seems.

(() => {
const app = Application("DEVONthink 3");
const db = app.databases['Resources'];
const tags_list = db.parents().filter(p => p.tagType() === 'ordinary tag');
tags = tags_list.map(x => x.location().replace("/Tags/","")+x.name());
tags.sort();
})();

The .sort() at the end is just for functional purposes (and to compare). These are the results:

% time osascript -l JavaScript get-tags-optimized.scpt
osascript -l JavaScript get-tags-optimized.scpt  0.05s user 0.02s system 27% cpu 0.255 total
% time osascript -l JavaScript get-tags-optimized2.scpt
osascript -l JavaScript get-tags-optimized2.scpt  0.47s user 0.03s system 72% cpu 0.690 total
% time osascript -l JavaScript get-tags-optimized3.scpt
osascript -l JavaScript get-tags-optimized3.scpt  0.19s user 0.08s system 46% cpu 0.570 total

Most obvious is that the whose statements only need two replies (as Script Editor calls them), while the .map() version needs 2 replies (location and name) for each tag (I have 273 tags in this DB).

I think I’ve reached the end of the line in terms of efficiency unless I can find I way to get two properties returned (location and name) at once. JXA is still a very functional language for this type of work, but it’s also clumsy as hell ;-

mdbraber · March 21, 2024, 8:31pm

Well it seems that maybe the end of the road might be JSObjC, but I’m pretty sure I don’t want to go that route to shave of 100ms

chrillek · March 21, 2024, 8:37pm

JXA is just the interface to AppleEvents. The language is JavaScript. whose belongs to JXA, filter to JS. Good job with demonstrating the advantage of the former over the latter here!
Depending on how often you call this script, performance might matter. If you use it only once in a while, it’s probably irrelevant.
One could perhaps cram the locations into an object with the tag names as key. I don’t know if that would change anything performancewise, though.

cgrunenberg · March 22, 2024, 5:58am

A future release might add a location with name property. I missed this a few times already too

cgrunenberg · March 22, 2024, 6:16am

It’s possible in AppleScript but not faster than two queries:

tell application id "DNtp"
		{name, location} of (parents of current database whose tag type is ordinary tag and tags is not {})
end tell

mdbraber · March 22, 2024, 7:41am

That would be great!

chrillek · March 22, 2024, 8:53am

Just wondering: can tag type be “ordinary tag” and tags empty ever be true at the same time? Wouldn’t that mean to have an empty tag?

cgrunenberg · March 22, 2024, 9:05am

Sure. It’s just a tag without (enclosing) tags.

chrillek · March 22, 2024, 9:56am

Thanks!