Capture a PDF to Devonthink from almost anywhere

Hello all,

Here’s a script I use to capture the front document of whatever app I am using and put it into a PDF in devonthink. Right now it works mostly with browsers (camino, safari, devonagent, etc.) and pdf readers (skim, pdfpenpro, and (kind of ) preview.app). I set this script to a hotkey and it will try to figure out whether I’m browsing or editing a pdf, looking a a web page, etc, and do the right thing. As a bonus, if the pdf is not already ocr-ed it will do that too. Great for using jstor or the like. More applications can be added pretty easily if you can tell me your usage. Hopefully you will find the script handy…

erico

---- PDF-Archive current document
----version 1.1c by Eric Oberle
----This script attempts to detect which program is currently the frontmost, and attempts to send a pdf of that document to Devonthink Pro.  The goal of the script is to allow you to have a single key to archive all files. 
--it is best used with a "hot key" program, such as red sweater software's "Fastscripts"

----it should work with Preview.app (most of the time), Skim, Pdfpenpro for pdfs, and it should work with DevonAgent, Safari, Omniweb, Camino, Firefox, and Opera.  If any one of these browsers has a webpage open instead of a pdf, the script will attempt to make a pdf out of that webpage. 

---the script currently expects that you have installed growl.

---note: if there are some of these  browsers that you would never use, feel free to comment out that code block by enclosing it with (*   *).  Otherwise applescript editors will open them whenever you try to edit this script.

property app_to_use_if_in_script_editor : "Skim"
property use_growl : true

----figure out which program is frontmost
tell application "System Events"
	set front_prog to displayed name of first process whose frontmost is true
end tell
set front_prog to front_prog as string
tell application "DEVONthink Pro"
	try
		set insert_location to incoming_group
	on error
		try
			set cur_sel to selection
			set insert_location to parent of cur_sel
		on error
			set insert_location to incoming group
			--set insert_location to root of database 1
		end try
	end try
	set the_Window to window 1
end tell
set insert_location_string to name of insert_location



----This variable is set above to ease debugging. 
if front_prog contains "Script" then set front_prog to app_to_use_if_in_script_editor


---Do the right thing to create the PDF
if (front_prog contains "Safari") or (front_prog contains "WebKit") then
	using terms from application "Safari"
		tell application front_prog
			
			if not (exists document 1) then error front_prog & " seems to have no document open."
			set the_url to the URL of document 1
			set the_title to the name of window 1
			
			try
				get source of document 1
				set the_source to result
			on error message
				log message
				set the_source to "PDF"
			end try
			
		end tell
	end using terms from
	my create_pdf(the_source, the_title, the_url, "", insert_location, true)
	
	
else if (front_prog contains "DEVONthink") then
	tell application "DEVONthink Pro"
		set the_url to the URL of the_Window
		set the_title to the name of the_Window
		try
			set the_source to source of the_Window
			if the_source is not "" then set the_title to get title of the_source
		end try
		set pdf_content to get paginated PDF of the_Window
		set new_pdf to create record with {type:picture, URL:the_url, name:the_title}
		set data of new_pdf to pdf_content
		my growl_notify("Completed", "PDF filed", the_title & " to " & insert_location_string, true)
	end tell
	
	
else if (front_prog contains "DevonAgent") then
	tell application "DEVONagent"
		if not (exists browser 1) then error "DevonAgent seems to have no document open"
		
		set this_url to the URL of browser 1
		set this_title to the name of window 1
		set this_source to the source of window 1
		set pdf_content to get paginated PDF of the_Window
	end tell
	tell application "DEVONthink Pro"
		set new_pdf to create record with {type:picture, URL:the_url, name:the_title}
		set data of new_pdf to pdf_content
		my growl_notify("Completed", "PDF filed", the_title, true)
	end tell
	
	
else if front_prog contains "Camino" then
	using terms from application "Camino"
		tell application "Camino"
			set the_url to URL of browser window 1
			set the_name to name of browser window 1
		end tell
	end using terms from
	my create_pdf("", the_name, the_url, "", insert_location, true)
	
	
else if front_prog contains "firefox-bin" then
	using terms from application "Firefox"
		tell application "Firefox"
			set the_name to «class pTit» of window 1
			set the_url to «class curl» of window 1
		end tell
	end using terms from
	my create_pdf("", the_name, the_url, "", insert_location, true)
	
else if front_prog contains "Opera" then
	using terms from application "Opera"
		tell application "Opera"
			set myInfo to GetWindowInfo of window 1
			set the_url to item 1 of myInfo
			set the_name to item 2 of my info
		end tell
	end using terms from
	my create_pdf("", the_name, the_url, "", insert_location, true)
	
	(* else if front_prog contains "NetNews" then
	using terms from application "NetNewsWire"
		tell application "NetNewsWire"
			set tab_num to index of selected tab
			if (tab_num is greater than 0) then
				set some_urls to URLs of tabs
				set the_url to item (tab_num + 1) of some_urls
				set tab_titles to titles of tabs
				set the_title to item (tab_num + 1) of tab_titles
			else
				set the_url to get URL of selectedHeadline
				if the_url is "" then error "Please make sure you have a web page in view in Netnewswire"
				
			end if
		end tell
		my create_pdf("", the_name, the_url, "", insert_location, true)
	end using terms from  *)
else if front_prog contains "Vienna" then
	tell application "Vienna"
		set the_url to link of current article
		set the_name to title of current article
		set the_source to documentHTMLSource
	end tell
	my create_pdf(the_source, the_name, the_url, "", insert_location, true)
	
else if front_prog contains "OmniWeb" then
	using terms from application "OmniWeb"
		tell application "OmniWeb"
			if not (exists browser 1) then error "No browser is open."
			
			set the_url to get address of active tab of browser 1
			set the_name to get name of browser 1
			set this_source to do script "document.body.innerHTML" window browser 1
		end tell
		my create_pdf("", the_name, the_url, "", insert_location, true)
		
	end using terms from
else if front_prog contains "PDFPen" then
	tell application front_prog
		set the_path to path of document 1
		set the_name to name of document 1
	end tell
	my create_pdf("PDF", the_name, "", the_path & " to " & insert_location_string, insert_location, true)
	
else if front_prog contains "Skim" then
	
	tell application front_prog
		
		set the_document to document of window 1
		set the_path to path of the_document
		set the_name to name of the_document
		set the_source to "PDF"
		my create_pdf(the_source, the_name, "", the_path, insert_location, true)
		
	end tell
	(*  Disabled because Acrobat 9 applescript is broken

else if front_prog contains "Acrobat Pro" then
	tell application front_prog
		using terms from application "Adobe Acrobat Pro"
			set the_document to document 1
			set the_name to name of the_document
			set myScript to "this.saveAs(\"/var/tmp/" & the_name & ".pdf\", 
\"com.adobe.acrobat.pdf\");" as string
			
			do script myScript
			
			---set save_alias to POSIX file "/var/tmp" & the_name
			---save the_document to save_alias
			
			set the_source to "PDF"
			set the_path to ("/var/tmp/" & the_name)
		end using terms from
	end tell
	my create_pdf(the_source, the_name, "", the_path, insert_location, true)
	*)
else if front_prog contains "Preview" then
	----this block of code makes the Preview.app program scriptable
	---it is a hack, but seems to work.  Thanks to Daniel Jalkut
	---http://www.red-sweater.com/blog/150/minimal-scriptability
	set the_result to ""
	try
		tell application "Finder"
			set the Preview_app to (application file id "com.apple.Preview") as alias
		end tell
		set the plist_filepath to the quoted form of ((POSIX path of the Preview_app) & "Contents/Info")
		
		set the_script to "defaults read " & the plist_filepath & space & "NSAppleScriptEnabled  2>&1"
		set the_result to do shell script the_script
	on error the_result
		set the_result to the_result
	end try
	try
		if the_result contains "does not exist" then
			log (do shell script "defaults write " & the plist_filepath & space & "NSAppleScriptEnabled -bool YES")
			my growl_quick("I just attempted to make Preview.app scriptable.  You might need to quit that program and start it again in order for this to work.")
		end if
	end try
	
	---gets all windows
	tell application "Preview"
		set preview_docs to {}
		repeat with this_win in windows
			set end of preview_docs to document of this_win
		end repeat
	end tell
	
	
	repeat with thisDoc in preview_docs
		set the_name to "unknown document"
		try
			set the_name to name of thisDoc
		end try
		try
			set the_path to POSIX file (path of thisDoc) as Unicode text
			
			set the_source to "PDF"
			my create_pdf(the_source, the_name, "", the_path, insert_location, true)
		end try
	end repeat
else
	activate
	display dialog "I'm sorry this script does not know how to handle application " & (front_prog as string)
	
end if






on create_pdf(the_source, the_name, the_url, the_path, insert_location, notify)
	tell application "DEVONthink Pro"
		log "path = " & the_path
		if the_source = "PDF" and the_path is not "" then
			with timeout of 200000 seconds
				set new_pdf to import (POSIX path of the_path) to insert_location type "PDF"
				if kind of new_pdf is not "PDF+Text" then
					convert image record new_pdf
				end if
			end timeout
		else ---here all cases where we have to download something
			if the_source is not equal to "PDF" and the_url is not equal to "" then
				set temp_page to create record with {type:bookmark, URL:the_url, source:the_source}
				
			else
				with timeout of 4000 seconds
					set pdf_data to download URL the_url
				end timeout
				set temp_page to create record with {type:picture, URL:the_url}
				set data of temp_page to pdf_data
			end if
			log the_source
			log the_url
			
			set the_Window to open window for record temp_page
			set visible of the_Window to false
			repeat while loading of the_Window
				delay 1
			end repeat
			with timeout of 20000 seconds
				if the_name is "" then set the_name to get name of the_Window
				set pdf_content to get paginated PDF of the_Window
				set new_pdf to create record with {type:picture, URL:the_url, name:the_name}
				set data of new_pdf to pdf_content
				--			my growl_notify("Completed", "PDF filed", the_name, true)
				close the_Window
				delete record temp_page
			end timeout
		end if
		
	end tell
	if notify is true then
		my growl_initialize
		my growl_notify("Completed", "PDF filed", the_name, use_growl)
	end if
	return new_pdf
end create_pdf






on growl_initialize()
	
	tell application "GrowlHelperApp"
		-- Make a list of all the notification types 
		-- that this script will ever send:
		set the allNotificationsList to ¬
			{"Starting", "Completed"}
		set the enabledNotificationsList to ¬
			{"Starting", "Completed"}
		
		register as application ¬
			"Devonthink Scripting" all notifications allNotificationsList ¬
			default notifications enabledNotificationsList ¬
			icon of application "Devonthink Pro"
	end tell
end growl_initialize

on growl_notify(notification_type, the_title, the_message)
	
	tell application "GrowlHelperApp"
		if notification_type is "Completed" then
			
			notify with name ¬
				"Completed" title ¬
				the_title description ¬
				the_message application name "Devonthink Scripting"
			
		else
			notify with name ¬
				"Starting" title ¬
				the_title description ¬
				the_message application name "Devonthink Scripting"
		end if
		
	end tell
end growl_notify

on growl_quick(the_message)
	my growl_notify("Generic", "Attention:", the_message)
end growl_quick


note: safari bug fixed 10/26

I’m just using the print dialog and then the “Save to DEVONthink” or (more frequently" “Save to Printed” scripts. This has the added benefit of letting me look at the preview before I print, to see how it’s going to look (e.g. printing from some web pages is cluttered unless you use the “print” version of the page.

What differs in this script from using the built-in “Print to PDF” features? This is one area that I think DT currently works really well. Two scripts in ~/Library/PDF Services that are great. I went to the location and hid to suffix so they don’t have the .scpt in the menu item.

Well, I agree those scripts are nice. This script is partly to save all the clicking that they entail, but also to not do a few other things that they do, can do it in the background with one key press, and add one nice feature, ocr-ing.

The biggest deal, however, is what they don’t do. First, if you found a pdf behind an authenticate site (I do a lot of this), you can tell safari to “open pdf in Preview.app” or “Skim.app” (my favorite), and then let this script capture it from there. Obviously, if I had the foresight to always begin all my searches in devonagent or dtpro, this wouldn’t be an issue, but I often don’t.

Another “doesn’t do” part is also important. Because this script doesn’t send things through the print system, it won’t alter an existing PDF according to the rules of page setup and the dimensions of your default printer and all that. It keeps the original pdf properties. And it also encodes into dtpro the original URL link in case you want to go back to the site (The URL is placed in the URL metadata). It would for this reason be very easy to add a check in this script to see if a pdf with this url has already been added. The final bit of metadata preserved is filename–this script tries to keep the original file name if it can determine it.

Finally, there is the “only does if necessary” part. This script will tell dtpro to ocr the pdf if it needs to be ocred, and not do it if it doesn’t need to be. (not a big deal for a 2 page document…but, well, you can imagine.) And it is of course foolish to ocr an article that may have been ocred on a higher resolution before being published…

It’s mainly the for these subtle reasons that I prefer it, especially when dealing with big PDF files (downloaded articles and such), but I agree with you, the fact that dtpro does such a good job of these things is why I tried to improve on it.

Finally, the script can be tweaked to create webarchives, if you prefer those. Maybe not everyone’s cup of tea, but it’s easily customizable from this point…

-erico

Thanks erico! I agree it’s great to preserve the URL and associated metadata.

I pasted the script into Script Editor but when I try to run I get error (null) and when I try to compile, it gives “Syntax Error: Expected end of line but found class name.”

The highlighted word is the first instance of “window” in this section of code (in the line beginning with “set the_url”:

else if front_prog contains "Camino" then
   using terms from application "Camino"
      tell application "Camino"
         set the_url to URL of browser window 1
         set the_name to name of browser window 1
      end tell
   end using terms from
   my create_pdf("", the_name, the_url, "", insert_location, true)

Any suggestions to get this working?

What version of Camino are you using? I’d try updating to the newest version and see if that solves the problem…they recently changed the applescript vocabulary, and I think this script relies on that…but tell me the version, and we’ll see…

-erico

I don’t have Opera, Vienna or Camino. Can you help me with customizing your script, please?

Hi

It’s working quite well for me, so thanks a lot!

Just two questions:

  1. The script would always stop to ask me to locate Opera (which I don’t use). I’ve just commented out the section on Opera.

  2. Growl, which I have and works well otherwise, doesn’t give me any feedback (which is a bit disconcerting, since I don’t know if the import has worked).

Thanks again and all the best

Hendrix

Hi

The script keeps malfunctioning with Growl. I always get an error message saying

Error number: -609
Message: GrowlHelperApp got an error: Connection is invalid

An Growl doesn’t “growl” at all.

Any ideas?

Thanks