"get links of" only returns unique links?

I’ve been doing some work on AppleScript that processes the links in a page using “get links” but I’ve been confused by “get links” only returning a proportion of the links

This is some test code.

tell application id "DNtp"
	set thisTab to open tab for "file:///Users/user/tmp/sample.html"
	set theHTML to source of thisTab
	set theLinks to get links of theHTML
	set linkCount to 0
	repeat with theLink in theLinks
		if theLink contains "task=SearchProduct" then
			set linkCount to linkCount + 1
		end if
	end repeat
	log (linkCount)
end tell

This counts 24 links.

However, checking via grep I get 52 links

> grep href sample.html | grep -c task=SearchProduct
> 52

On further analysis it looks “get links of” is only returning unique links. If there are multiple links in the same piece of HTML that have the same href value the link is only reported once. Is this by design or an undocumented feature ?


By design.

Thanks for the quick response.

If you really want all “links” (and you probably mean a elements, if you say that – or do you also want link elements?), you could use fairly simple JavaScript code and send that to the tab with do JavaScript:

const aElements = document.querySelectorAll("a"); 
JSON.stringify([...aElements].map(a => a.href));

Note that this returns a JSON string, as you can’t return lists or anything more complicated than string, number or boolean to a do JavaScript call.

As shown in an older thread, a complete JavaScript script could look like this:

(() => {
  const app = Application("DEVONthink 3")
  app.includeStandardAdditions = true;
  const record = app.selectedRecords[0];
  const tw = app.openWindowFor({record: record});
  const script = `const aElements = document.querySelectorAll("a"); 
    JSON.stringify([...aElements].map(a => a.href))`;
  const result = app.doJavaScript(script, {in: tw});

I don’t bother doing that in AppleScript, since the list of href values is a JSON string, which is probably a PITA to convert to a list in AS. In JS, you’d use
const arrayOfHrefs = JSON.parse(result);

And then you can do whatever you want with the list of URIs.