Archive RSS HTML not just Bookmarks

wtagg · April 2, 2013, 2:16pm

Hello,

I use DTPro to download RSS feeds from Instapaper, Twitter, and Tumblr. For Twitter and Instapaper, DTPro only seems to download a bookmark and not a complete HTML file. I’d really like to archive as much as possible from the RSS feed (HTML, images, etc) but I’m not sure how to do this or even if it’s possible. I’ve done some searching in the forums but I have not seen anything that addresses this in a way I understand. But I feel pretty confident this has been answered previously.

Can anyone point me in the right direction?

korm · April 2, 2013, 3:04pm

I find the simplest thing is to have one or all of these icons in your tool bar, and click the one you want to use:

“Save as HTML”, “Save as Web Archive”, “Save as PDF”, or “Save as Rich Text” (the icon says “Note”, it means “rich text” – RTF).

The buttons cause a new document created in the parent group of the group where the source document is found, so there’s a bit of fumbling around involved in locating the new document. Sort of buggy, that part, but at least the document creation aspect does what you want.

wtagg · April 2, 2013, 3:16pm

Thanks, Korm. That’s a great tip, easy to understand and will help me in other ways.

But I was hoping to find some approach that would do this automatically. What happens now when I open the database, e.g., “Social Media Archive”, is that DTPro will download the RSS feed without any further intervention from me. So far, so good. But, for reasons I can’t figure out, the download will result in a bookmark for some services (Twitter, Instapaper) and a HTML file for others (Tumblr). Ideally, I’d like to arrange this so that I could choose which type of file results from the download: plain text for Twitter, PDF for Instapaper, and HTML for Pinboard and Tumblr.

Any ideas on how I could configure this? Not sure if DTPro can do this without any scripting.

korm · April 2, 2013, 3:42pm

AFAIK, the format for the feed (bookmark, vs. HTML) is controlled by the feed provider and not DEVONthink.

I don’t completely understand the requirement, but a script seems necessary.

wtagg · April 2, 2013, 3:53pm

Thanks, Korm. That’s what I suspected.

Here’s an example of what I would like to happen:

Open database
RSS feed from feed provider (Pinboard) requested and downloaded as “bookmark” file
URL in bookmark file used to prompt download/generation of HTML.
Once HTML file has been downloaded/generated and save in the same folder as the bookmark file, the bookmark file is deleted, leaving the HTML file in its place.

I’m not sure that my assumptions about what needs to be downloaded or deleted are necessary but I hope the outcome is clear. If this kind of operation were possible to automate, I would like to extend it a bit further to specify the types of files that could be generated from the bookmark URL. For example, it might be useful to have a RSS feed archived as a series of PDFs. It might also be useful to have a RSS feed that contains URLs to images archive the image files instead.

Does this make sense? I’m interested enough in solving this problem to learn about what it would take to do so. But, I’m just not sure what steps I should take first. I’ve spent a small amount of time learning Ruby and have dabbled in Applescript but, at this point, I couldn’t write anything from scratch. If writing a small script is the answer, I would love to know how to start. Maybe this post belongs in another DT forum?