Possible to identify URLs from captured HTML documents and then re-capture the identified URLs as PDF/Markdown?

I am new to DT and really enjoy this fantastic tool these few days!

I am trying to capture URLs as PDF/Markdown from HTML documents subscribed from an RSS feed but did not find a feasible solution to that.

Here is what I would like to achieve:

  1. Subscribe an RSS feed as HTML format in DT
    • The reasons for not directly subscribing the feed as PDF is that the RSS feed does not look nicely in DT mainly because the feed is re-generated from another feed, and that it seems that one cannot set the format for individual feeds.
  2. Capture URLs on the HTML documents and save them as PDF/Markdown within DT
    • The URL on the HTML page points to the original address of the RSS feed.
    • The URL of all subscribed HTML documents locates at roughly the same location on the HTML page, i.e. the bottom of the page.

Is there a workaround for this? Maybe some kind of smart rules with scripts can be used?

I would really appreciate it if any suggestions can be provided. :grinning:

1 Like

A smart rule executing a script should be an indeed option, see e.g. AppleScript commands get links of and create PDF document from

1 Like

Thanks for the response. I am new to DT and AppleScript. In fact, I just figured out how to execute an AppleScript together with smart rules. :sweat_smile:

Can you elaborate the following?

  • Where to find get links of and create PDF document from?
    • I did not find them when searching online or in DT. Are they DT specific?
  • Since on HTML document may contain several links, is it possible to locate the desired link, i.e. the one at the bottom?

Those commands are in DEVONthink’s AppleScript dictionary.
Open Apple’s Script Editor and choose File > Open Dictionary. Select DEVONthink and you can browse through its commands, classes, and properties.

And no you can’t look for the bottom link, but the get links of command returns a list. You could get item -1 of the list, assuming the desired link is actually the last one.

Great! I found those commands for DT. A few follow-up questions:

  • I can select a group/feed and use a smart rule to get links of an HTML page from the feed. I further checked the HTML from the feed, and the desired link should be the second last one. Does this mean that it is item -2 of the list?
  • How do I save/accumulate the desired link of each HTML into a single document or something that helps to download all the linked articles as PDF in one action? For this purpose, there can be two use cases. Can they be achieved in some way?
    • Whenever a new feed article arrives, the desired link is extracted and appended into the desired document.
    • One selects all the HTMLs from the feed and extracts all desired links into one single document with one action.
  • Now I know where to find those AppleScript commands for DT, but now sure how to use them in an AppleScript for this case. Is it possible to provide an AppleScript file or some guidance that I start working with?

Typing “tell application id DNtp” into the search field for this forum should give you plenty of points to start from.

Yes, the second to last item would be -2, third from last -3, etc.

I will take a look at your query further in a bit. Thanks for your patience and understanding.

1 Like

Thanks for the tip. Indeed, there are many useful scripts and resources.