Time to fully integrate Readability into DT

acoyne · December 29, 2010, 10:39pm

Have a look at Reeder, the best rss reader out there, in its Mac version: you can set it to automatically retrieve the main text of any web page, by default, using the Readability algorithm, instead of the slow-loading, hard-to-read original.
I have a script installed to load the Readability version of pages in DT, after the original has already been loaded, but it would be very cool if, like Reeder, DT could be instructed to go and get the text of every page, by default, using Readability, and download it — that is, instead of the original page, rather than in addition to it.
Is possible? Anybody know how to script it, in the interim?

rollo · January 5, 2011, 12:05am

I enthusiastically second this suggestion. I apply readability to every page before I capture it into DTP.

tommysundstrom · January 5, 2011, 11:40am

I have a Ruby script that takes the URL of a PDF page, runs it through Readability, and replaces the PDF page with a RTF file. (If you know Ruby, it could easily be modified to do the same with any kind of document that has an URL).

It does has a significant effect on the accuracy of Move to/See Also.

The script can be found here:
github.com/tommysundstrom/DevonThink-helper

rollo · January 11, 2011, 3:29pm

Prompted by acoyne’s mention above, I checked into Reeder, and find it absolutely fantastic. It is missing just one thing that would probably save me an hour (maybe even several hours) every day, and I want to know if anyone here has the answer.

My DTP document capture workflow currently operates like this:

I capture links that appeal to me to Instapaper with a single click. I mostly browse on my iPhone and the source of the links is usually a carefully curated twitter list or RSS feeds.
In Instapaper I open up each item in a separate tab in Safari. I typically save 200 or more items a day this way. I then apply Readability to each tab. Then I use the Take Rich Note service to capture the formatted note to DTP. And now I’m also beginning to tag files as they land in DTP. It takes up a lot of time each day.

I experimented with Reeder. It is fantastic, not least because it can format documents by default using Readability. I set up an RSS feed from Instapaper, added it to my Google Reader account and was then able to browse everythying I had captured very quickly and efficiently in Reeder, all nicely formatted (doesn’t work for every site, but does most of the time).

I then used the same Take Rich Note service to get the content into my DTP database. Everything about it is excellent … except for one thing.

The missing part of the jigsaw is that it doesn’t capture the URL of the piece, so there’s nothing in the URL field. Instead the URL is embedded in the title of the piece.

I often need to refer back to the source material, or pass on links to people, so I refer to the URL field all the time. In addition I have smartfolders set up to capture sources by URL. That means that within my database I can rapidly find everything sourced from, for example, The New York Times, or any other source. But this won’t work if the URL field is empty.

So my question is this: What can be done to transfer Rich Text content from Reeder with a single keystroke, but also capture the URL of the piece?

Would it be possible to write another service that does exactly that? It’s not the sort of thing I could do, but I can’t imagine it would be hard?

If there is another answer, I’d be delighted to know because the Instapaper > Reeder (with Readability on by default > DTP workflow will save me an unbelievable amount of time.

azarias · September 12, 2012, 8:09pm

So, what happened?

Full Readability integration into DT would be great. I can’t properly archive my articles from Reeder to DT!

This is something that - up to now - Evernote handles a lot better than DT.

Greg_Jones · September 12, 2012, 10:24pm

I think perhaps you are giving Evernote more credit than perhaps they deserve. Reeder does all the conversion & exporting to Evernote on their end. Have you contacted Reeder’s author to inquire if he might add support for doing same with DEVONthink?

BLUEFROG · September 13, 2012, 12:14am

Amen! Oops, couldn’t resist.

azarias · September 13, 2012, 10:00am

It’s true that Reeder offers direct integration into EN. But EN offers email import, which is great for other apps that can share via mail, but have no integration with readability.

Does DT have this, too?

PS: Just wrote the Reeder dev asking for DT integration.

Greg_Jones · September 13, 2012, 12:58pm

DEVONthink does not have an email server/user account to forward emails to. However, you can subscribe to the RSS feed of your Readability account (or Instapaper, or practically any other RSS feed) in DEVONthink.

It’s been a while since I’ve tried Evernote, but in the past I found it fairly painless to get data in to Evernote, but far harder to get anything back out. Don’t know if that has changed any or not…

azarias · September 13, 2012, 1:12pm

That’s not the workflow I would like to accomplish.

Instead, I want to use DT for archiving interesting articles that I have already read and want to refer to later.

It should work like this:

Take input from any iOS app / Mac app (like Reeder, but also all the custom apps from newspapers and so on).
Run it through Readability or Instapaper.
Store it in DevonThink.

It should especially work on iOS, e.g. through the DT To Go app.

Greg_Jones · September 13, 2012, 2:55pm

But the workflow is the same, except the DTTG step is bypassed. I read feeds in Reeder on my iPhone, save th ones I want to archive to Instaper, and it appears in DEVONthink as a RSS feed. How is this materially different from your desired workflow?

azarias · September 14, 2012, 10:07am

I don’t want an RSS feed. I want a permanent archive in DT (web archive or PDF). The RSS feed does not do that. It would make DT a reading program, which Instapaper is already and also not fulfill the purpose of archiving documents which I use DT for.

Greg_Jones · September 14, 2012, 10:37am

Right-click on the RSS feed page and it’s all there-web archive, PDF, Capture Page to make a HTML document, select some text and choose Capture Note to get a RTF document of the selection. I suspect that one could also write a simple script to attach to the feed’s group to automate the conversion, but I’ve not explored that.