making "get links of" more useful

Instead of “get links of” just returning a simple list of the URL strings, I would find it way more useful in my parsing scripts if it returned a list of records where each record contained the link URL as well as the text content of the link.

For example, when run on the following source …
this is the text
… it would return …
{{theLink:“”, theText: “this is the text”}}

Additionally, it would be very helpful if the list was in source text order.

Thank you.

I second this script request!

Would be very great to have a more powerful “get links of” script capabilities for using this on other forum/imaging sites and so on…

I would love to see this too.

Regexp is of course one of the things that applescript really should do for us, but doesn’t. If one has the source html, it should be relatively trivial to extract both parts of the Html links. But it is not that simple, as links can be quite complex and involved embeded junk, javascript, etc . Over the long run, I’ve found applescript extremely annoying because of its lack of regex support, and started using python for doing all seriously complex string manipulation work. Python’s beautifulsoup library does a really great job of this for the sake of parsing html. It is error tolerant and goes through the whole DOMs. I would love to see devonthink and devonagent support beautiful soup or some other DOM parser directly.

And I agree making “get links of” return a key-pair array would be great.

But so as not be totally annoying…here is a solution that you can use. I’m even willing to clean up the code a bit more and make it a clean subroutine, if anyone has interest. Several years ago I did figure out a way to issue a single line command to Perl in order to do the kind of regex that (absent a built-in command that returned pairs) to do exactly what you are asking for. The nice thing is that it is pure applescript use of regex, requiring no extra libraries, only the BSD sublayer and perl.

It was posted to the board three years ago, and I just updated it for DtPRo. It can be found here:

Hopefully it’s helpful!


Eric o

It is since v2.0.5.