Automatically retrieve page linked from RSS feed?

Is it possible to set up an RSS feed such that link URL in each item is automatically fetched and stored in DEVONthink as either Rich Text or webarchive?

The use case is, for example, automatically saving a copy of each article posted to a news site.

Not automatically, no. It is possible to use scripts to process files in a feed.

If you set DEVONthink > Preferences > RSS > Remove Articles to “manually”, then the feed will remain intact until you delete the articles. So, you could come along at any time, select a number of articles, and use Data > Convert > to Rich Text on the whole selected block of articles.

Or, you could use DEVONthink to monitor a feed.

korm: Unfortunately, for feeds that only provide headlines or summaries and not full articles, that would just convert the headlines or summaries into a Rich Text document. I was hoping there’d be a way DT could follow the link to the full article and capture it automatically. It appears not without some scripting work.

Note that converting web content into Rich Text is usually an ugly compromise. Just saying.

That’s true, it’s not pretty. I mainly just wanted to collect a searchable text archive over time, though, so it’s not essential that it be an exact replica of the web page.

The script available from the Support Assistant (in the Help Menu): Download > Convert URLs to web Documents might be useful. It grabs the URL from the RSS article and downloads a web archive of that page.

YMMV – every site does RSS differently and the URL in the feed might be a re-direct and not the URL of the actual page you’ll want to save.

You can select multiple articles and run this conversion script.

Oh, that’s pretty handy! Thanks for pointing it out.

Not yet but a future release will support this.

1 Like

Is this supported yet by chance?

See Preferences > RSS > Format (or Format option in Info inspector/popover of feeds).

1 Like

Fantastic. Thank you.

The RSS preferences won’t work for my Feeds from thehill.com. The feeds come in two types, but neither works:

http://thehill.com/rss/syndicator/19109
http://thehill.com/taxonomy/term/1131/

I get only the abbreviated page, not the URL.

I’ve tried all of the options, from Automatic to WebArchive – none work, although many new ones have downloaded.

The script to Download as WebArchive etc. works, but changing the preferred format doesn’t work no matter what I do, or whether I do it in preferences or individual feeds.

That script isn’t available for a Smart Rule – not sure why not.

Any tips here? Thanks!

Setting the Preferences > RSS > Format to Web Archive and using the first feed URL, I see…

image

But bear in mind, this is dependent on the network connection and responsiveness of the contacted servers.

  • Have you quit and relaunched DEVONthink?
  • If so, does this persist after a machine reboot?
1 Like

That did the trick. It may have been the computer reboot, but not certain. I also turned off Remove Clutter, which I suspect actually did the trick. WebArchive setting is working beautifully. Thank you.

1 Like

You’re welcome :slight_smile: