Script to download Instapaper version of any link to DT

Okay this is pretty cool, and gets us a step closer to the Readability integration I argued for in this post.
Say you see a link on a page you want to save and read later. But you don’t want to save just the link, and you don’t want to have to load the entire web page (slow) and save that (slow a second time, when you come to load it for reading). There’s Instapaper and (better) Readability, but in order to use them (at least in DT) you have to load the page first, then convert it, then save it to DT.
This script saves a good many of those steps. Just copy the URL of the link you’re interested in to the clipboard (in Safari or DT, by control-clicking and selecting “Copy Link” from the contextual menu). The script does the rest: downloads the markup from the Instapaper version of the page, then uses it as the source to recreate it in DT. Presto: you get just the body text of the page associated with the link, saved in DT.
(Readability has a better algorithm than Instapaper, but I don’t know how to tell it to deliver the text to me: unlike Instapaper, it doesn’t create a separate page with its own url to point the “download markup” command at. Any one got an idea?)
Here’s the script:

tell application "DEVONthink Pro"
	
	set theRawUrl to (the clipboard)
	set search_string to "http://"
	set replacement_string to ""
	set AppleScript's text item delimiters to search_string
	set text_item_list to every text item of theRawUrl
	set AppleScript's text item delimiters to replacement_string
	set theRawUrl to text_item_list as string
	
	
	set theURL to "http://www.instapaper.com/text?u=http%3A%2F%2F" & theRawUrl
	set theSource to download markup from theURL
	set theTitle to get title of theSource
	
	create record with {type:html, source:theSource, name:theTitle, URL:theURL} in incoming group
end tell

Here’s a PHP port of Readability. If I knew how to use and apply PHP I suspect I’d have an easy answer to my question: http://www.keyvan.net/2010/08/php-readability/

Maybe this helps. I wrote my Readability script for DTPO by snarfing the Javascript that Readability puts in the custom bookmarklet that you can build at Arc90’s site. You could incorporate this into your script, perhaps. Or grab the Javascript from the bookmarketlet for your own flavor.


-- this script implements the Readability bookmarklet in DTPO
-- Readability is provided by Arc90 Labs at http://lab.arc90.com/experiments/readability/
-- Readabilty is copyright its owner and this script does not modify the code provided by Arc90 at the above link
-- use: open an HTML document, bookmark, or Web Archive in DTPO

set ReadabilityCode to "javascript:(function(){readStyle='style-newspaper';readSize='size-medium';readMargin='margin-narrow';_readability_script=document.createElement('SCRIPT');_readability_script.type='text/javascript';_readability_script.src='http://lab.arc90.com/experiments/readability/js/readability.js?x='+(Math.random());document.getElementsByTagName('head')[0].appendChild(_readability_script);_readability_css=document.createElement('LINK');_readability_css.rel='stylesheet';_readability_css.href='http://lab.arc90.com/experiments/readability/css/readability.css';_readability_css.type='text/css';_readability_css.media='screen';document.getElementsByTagName('head')[0].appendChild(_readability_css);_readability_print_css=document.createElement('LINK');_readability_print_css.rel='stylesheet';_readability_print_css.href='http://lab.arc90.com/experiments/readability/css/readability-print.css';_readability_print_css.media='print';_readability_print_css.type='text/css';document.getElementsByTagName('head')[0].appendChild(_readability_print_css);})();"

tell application id "com.devon-technologies.thinkpro2"
	set theWindow to think window 1
	do JavaScript ReadabilityCode in theWindow
end tell

No, I’ve got that. But you can only invoke the bookmarklet if you first load the page in its original form. I’m trying to eliminate that step: just go straight to the Readability version. It’s a very simple thing for anyone who knows what they’re doing (that would leave me out): just a matter of invoking the python or PHP ports of Readability for a given url. I just don’t know how to do that.

There is a script using the Ruby port of Readability here: viewtopic.php?f=20&t=12392