AppleScript: How to query HTML?

My initial idea to use WKWebView might work, but I can’t get it to do so with JavaScript/JXA (lack of support for code blocks). You might have more luck with AppleScript/ObjC.

If we’re talking about querying HTML from inside DT for a record stored in DT, this works:

(() => {
  const app = Application('DEVONthink 3');
  const rec = app.getRecordWithUuid('2F437490-F4EC-4E38-8747-8F4FCC073F86');
  const thinkWindow = app.openWindowFor({record:rec});
  const result = app.doJavaScript(`var headings = document.querySelectorAll('h1,h2,h3');JSON.stringify([...headings].map(h => h.innerText));`, {in: thinkWindow});

As I said before, you can run JS code in a think window displaying an HTML record. The JavaScript code proper is this

var headings = document.querySelectorAll('h1,h2,h3');
JSON.stringify([...headings].map(h => h.innerText));

It uses the DOM method querySelectorAll to find all headings of level 1, 2, and 3. The resulting value is not an Array but a nodelist, which [...headings] converts to an Array. Then map extracts the innertext property from the HTML element, which is just the text of the heading. The return value of map is again an array. Since doJavaScript wants to return a string, JSON.stringify converts this array into a string. In your calling JavaScript code, you can simply use const myArray = JSON.parse(result) to turn it back into an Array.

querySelectorAll expects a CSS selector as its first parameter. That makes it fairly convenient (at least a lot easier to use then the wretched XPath grammar).