Dear all,
Here’s a much fancier version of the “pdf capture script” I wrote a couple of years ago. I keep this in my ~/Library/Scripts/ folder and set it to a hotkey with fastscripts lite. Based on which browser you are using (and it has code to support just about all of 'em), it will try to download either a webarchive or a pdf or a jpg/png, as is appropriate to the URL in the current browser. It also has a few blocks of code that allow you to automatically route your clippings based on the URL (I for instance like to put all of my new york times clippings together). And it has some code to allow you to have the clippings go to a special “sort me folder”, or, alternatively to the current group in DTPRO, providing that the current group has the first type of label (red on my computer) tagged to it. This allows me to easily vary the way I clip things, and to only have to remember one “hotkey”. This is not, mind you, the simplest version of the “clip me” script, but the most complicated: it does nice things like detecting if you are alreading in devonthink pro and, if so, doing you the extra favor of not re-downloading the pdf if it is already in the browser. Can save a minute or so in many circumstances. It needs a bit more error-checking, I suppose, but what it does now is just embeds the url link in dtpro if it can’t for some reason get the pdf to work. This I find preferable to the error message, because I can go try to figure out the problem later from dtpro.
-Eric Oberle
p.s. If you don’t use all the browsers I do, you might want to comment out the unwanted browsers in the script. Otherwise, everytime you open the script in script editor, it will open all of the browsers active. That can be a lot of launching…
(* Capture Webarchive or PDF from current Program
Captures current browser page as a webarchive or a pdf or a gif/jpg file into current database in DevonThink Pro, by trying to guess which kind of file it is.
Written by Eric Oberle, Stanford University, borrowing code from various scripts by Christian Grunenberg for the amazing DevonThink Pro package.
If you reuse code from this script, please put an acknowledgment like this one at the top of your script.
*)
property default_app : "Opera" ----set this variable to the application to use when running manually inside script editor
property target_app : application "DEVONthink Pro"
property the_user_agent : "Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en) AppleWebKit/418 (KHTML, like Gecko) Safari/417.9.2"
set pdf_no_source to false
set this_title to "rename me"
-----check if dtpro has database open
tell target_app
using terms from application "DEVONthink Pro"
if ((count of databases) is 0) then
display dialog "please make sure devonthink has a database open"
return 1
end if
set this_source to ""
end using terms from
end tell
----figure out which browser is front most, and act appropriately
---note: if there are some of these browsers that you would never use, comment out the corresponding code blocks
----- by enclosing it with (* *). Otherwise applescript editors will open them whenever you try to end this script.
tell application "System Events"
set front_prog to displayed name of first process whose frontmost is true
end tell
set front_prog to front_prog as string
if front_prog contains "Script" then set front_prog to default_app
if (front_prog contains "Safari") or (front_prog contains "WebKit") then
using terms from application "Safari"
tell application front_prog
try
if not (exists document 1) then error "Safari seems to have no document open."
set this_url to the URL of document 1
set this_source to the source of document 1
set this_title to the name of window 1
end try
end tell
end using terms from
else if (front_prog contains "DEVONthink") then
tell application front_prog
set this_url to the URL of window 1
set this_title to the name of window 1
set this_source to source of window 1
set theName to name of window 1
end tell
else if (front_prog contains "DevonAgent") then
tell application "DEVONagent"
if not (exists browser 1) then error "DevonAgent seems to have no document open"
set this_url to the URL of browser 1
set this_title to the name of window 1
set this_source to the source of window 1
end tell
else if front_prog contains "Vienna" then
tell application "Vienna"
set this_url to link of current article
set this_title to title of current article
set this_source to documentHTMLSource
end tell
else
----these programs cannot supply the source, so we'll have to download it.
if front_prog contains "Camino" then
using terms from application "Camino"
tell application "Camino"
set this_url to URL of window 1
end tell
end using terms from
else if front_prog contains "Firefox" then
using terms from application "Firefox"
tell application "Firefox"
set this_title to «class pTit» of window 1
set this_url to «class curl» of window 1
end tell
end using terms from
else if front_prog contains "Opera" then
using terms from application "Opera"
tell application "Opera"
set this_url to URL of document 1
set this_title to name of document 1
end tell
end using terms from
else if front_prog contains "OmniWeb" then
using terms from application "OmniWeb"
tell application "OmniWeb"
if not (exists browser 1) then error "No browser is open."
set this_url to address of browser 1
end tell
end using terms from
(* else if front_prog contains "NetNews" then
using terms from application "NetNewsWire"
tell application "NetNewsWire"
set tab_num to index of selected tab
if (tab_num is greater than 0) then
set some_urls to URLs of tabs
set this_url to item (tab_num + 1) of some_urls
set tab_titles to titles of tabs
set this_title to item (tab_num + 1) of tab_titles
else
set this_url to get URL of selectedHeadline
if this_url is "" then error "Please make sure you have a web page in view in Netnewswire"
end if
end tell
end using terms from
*)
else
display dialog "browser unrecognized " & front_prog
return
end if
------download source if necessary
set pdf_no_source to true
(*try
tell application "DEVONthink Pro"
set this_source to download markup from this_url agent the_user_agent
end tell
set pdf_no_source to true
end try *)
end if
-------Determine where in database to store incoming
-------weblinks. Currently, I have it place New York Times files in a special
-------folder, and I have a rule that if the current devonthink folder is tagged with the "red" label
-------(Label 1), then I have devonthink store the pdf in that folder.
-------- Feel free to customize this code to your needs.
tell target_app
using terms from application "DEVONthink Pro"
try
if current group is "current application" then
set cuPos to {}
else
set cuPos to {current group}
end if
on error
try
set cuPos to selection of think window 1
on error
set cuPos to {}
end try
end try
(*
----uncomment out these lines to have the script always put all pdfs in the same place
set cuPos to get record at "/file elsewhere" in current database
set cuPos to {cuPos}
------then comment out the "special handling" block below
*)
-------begin special handling code
-----special location for urls from the New York Times
if this_url contains "nytimes.com" or this_url contains "-nyt." then
---create storage place for imported records
if not (exists record at "/NewYorkTimes") then
set import_location to create location "/file elsewhere/NewYorkTimes"
else
set import_location to (get record at "/file elsewhere/NewYorkTimes" in current database)
end if
if cuPos is "current application" then set cuPos to {}
--Put items with red folder pointing to a red labelled folder?
else if (cuPos is not {}) and (label of first item of cuPos is 1) then
if kind of item 1 of cuPos is "Group" then
set import_location to first item of cuPos
else if parent of item 1 of cuPos is not {} then
set import_location to last item of parent of item 1 of cuPos
else
set import_location to root of current database
end if
else
---or alternatively, use the next three lines to just put all clippings into one folder.
set import_location to (get record at "/file elsewhere" in current database)
set cuPos to import_location
set cuPos to {cuPos}
end if
------end special handling code
end using terms from
end tell
----------------------------------
--capture webarchive & source of original link (if orginal front program was devonthink or another browser that already has
---given us the source (i.e. DevonAgent, Safari) then don't reload it)
----------------------------------
tell application "DEVONthink Pro"
if (front_prog contains "DevonThink Pro") then
set pdf_no_source to false
try ------Force an error condition if no source in current link. This usually means a pdf or a gif file is loaded in window.
if front_prog contains "DevonThink" then
set this_source to source of window 1
set the_archive to webarchive of window 1
set the_record to create record with {name:the_title, type:html, URL:the_url, comment:the_comment} in import_location
set the URL of the_record to last downloaded URL
end if
on error
set pdf_no_source to true
end try
if pdf_no_source then
using terms from application "DEVONthink Pro"
tell application front_prog
try
with timeout of 400 seconds
repeat while loading of window 1
delay 1
end repeat
set pdf_record to create record with {name:this_title, type:picture, URL:this_url} in import_location
get PDF of think window 1
set data of pdf_record to result
return
end timeout
on error
try
delete record pdf_record
set link_record to create record with {name:this_title, type:link, URL:this_url} in import_location
set this_window to open window for record link_record
repeat while loading of this_window is true
delay 1
end repeat
set pdf_record to create record with {name:this_title, type:picture, URL:this_url} in import_location
get PDF of this_window
set data of pdf_record to result
close window this_window
delete record link_record
on error
set pdf_record to create record with {name:"failed: " & this_title, type:link, URL:this_url} in import_location
return
end try
end try
end tell
end using terms from
end if
else
try
tell application "DEVONthink Pro"
if this_source is "" then
log "no source" & this_source
set this_source to download markup from this_url agent the_user_agent
log this_source
end if
if this_source does not contain "head" or this_source does not contain "html" then
set pdf_no_source to true
else
log "the source length is " & length of this_source
set this_title to get title of this_source
with timeout of 300 seconds
set the_archive to download web archive from this_url agent the_user_agent
end timeout
set the_record to create record with {name:this_title, type:html, URL:this_url, source:this_source} in import_location
set the data of the_record to the_archive
return
end if
end tell
on error
try
if this_source is "" then
log "still no source"
set this_title to get title of this_source
set the_archive to download web archive from this_url
set the_record to create record with {name:this_title, type:html, URL:this_url, source:this_source} in import_location
set the data of the_record to the_archive
return
else
set the_record to create record with {name:this_title, type:html, URL:this_url, source:this_source} in import_location
end if
on error
log "fell through"
set pdf_no_source to true
end try
end try
end if
end tell
-----If this is a pdf or other image file , then capture it.
if pdf_no_source then
if front_prog contains "DevonThink" then
using terms from application "DEVONthink Pro"
tell application front_prog
try
with timeout of 400 seconds
repeat while loading of window 1
delay 1
end repeat
set pdf_record to create record with {name:this_title, type:picture, URL:this_url} in import_location
get PDF of think window 1
set data of pdf_record to result
return
end timeout
on error
try
delete record pdf_record
set link_record to create record with {name:this_title, type:link, URL:this_url} in import_location
set this_window to open window for record link_record
repeat while loading of this_window is true
delay 1
end repeat
set pdf_record to create record with {name:this_title, type:picture, URL:this_url} in import_location
get PDF of this_window
set data of pdf_record to result
close window this_window
delete record link_record
on error
set pdf_record to create record with {name:"failed: " & this_title, type:link, URL:this_url} in import_location
return
end try
end try
end tell
end using terms from
else
tell application "DEVONthink Pro"
using terms from application "DEVONthink Pro"
if not ((this_url ends with ".pdf") or (this_url ends with ".PDF") or this_url ends with ".gif" or this_url ends with ".GIF" or this_url ends with ".jpg" or this_url ends with ".png") then
create record with {name:"file format not clear: " & this_title, type:link, URL:this_url} in import_location
end if
set z to download URL this_url agent the_user_agent
if this_title is "" then
set this_title to "rename-this-pdf"
end if
try
with timeout of 500 seconds
set pdf_record to create record with {name:this_title, type:picture, source:this_source, URL:this_url} in import_location
set data of pdf_record to z
end timeout
on error
set pdf_record to create record with {name:"failed: " & this_title, type:link, URL:this_url} in import_location
return
end try
return
end using terms from
end tell
end if
end if
tell application "System Events"
beep
---we're finished!
end tell