Script for grabbing highlighted text from pdf?

Hi, I’m in law school and I use DTPro for reading pdf files of articles and case briefs and blah blah… I’ve been highlighting the pdf and marking them up within DTPRO. I can copy the text and paste it into a separate text file, but I was wondering if there was script, or even if it is scriptable, to have text within a pdf that I highlight, dumped into a single file? This would be terrific for taking notes in prep for papers etc… What would be even better would be if there was link back to the pdf you grabbed the info from.

There are probably multiple ways to do this, but the one that comes quickly to mind is using the services menu. Click “Take Rich Note” [Command - )]for the first passage “Append Note” [Command -@]for all subsequent passages. The result will be a rich text note with all selected passages that will be sitting in the Global Inbox. The link to the pdf will be at the top of the note.



DEVONthink Pro doesn’t support this but maybe Skim.

Tried what you suggested but can’t see the link to the pdf. Am I missing something?

My bad, you are right. You can, however, copy a link to to the pdf and paste it to the top of your rtf. notes.


Here’s an experimental script that I use to do this. If you read the script code, you’ll see that this application of scripting has been somewhat of an obsession of mine, and I hope not too much of annoyance to Christian…

It isn’t perfect, and there are a few limitations in devonthink’s scripting routines that make it a little quirky. The two worst of these limations are 1) you can’t add links to an rtf file without that window being open, so this script flashes a window open before quickly closing it. It’s annoying to look at, but it works. 2) since there is no way there is no way for scripts to know where a document window (i.e. the type of window you get when you double click) was opened from, it is unclear which parent (i.e. which folder) this note taking script should store the note. I therefore generally use the script in 3-pane mode, while holding out the hope that devonthink will someday allow document windows to report the group from which they were activated. These requests for scripting command features are littered through the document.

Oh, this script only makes sense if you make it hot-key activated. name it note_maker___ctrl-n.scpt or something like that, so you can use control-n to activate it.

One last thing: I think this script requires Growl to be installed. Feel free to comment those lines out if that bothers you.

Give it a try and tell me what you think.


----note pad script 1.2 by eric oberle

---this script is intended to help you take notes.  The idea is that you assemble your documents (or replicants of your documents) in a folder, start reading in three pane view and select text snippets within individual documents that you wish to "snippetize" into a "note file" that will be stored in the same folder as the documents/replicants.  This note file will include the selected text plus a link to the original document.  Right now it works, but a few of the quirks in dtpro scripting limit its capabilties, the worst of which being that the script loses the document and the text selection focus.

---If you are working in three pane mode and invoke this script with some text selected, the script will create a rich text file called <name of group>--reading notes, with the selected text pasted in. If such a record already exists, the script will append the selected text to the current record. At the top of every quoted text, the script attempts to put a link to the original document.  

----The purpose of the script is to show some possibilities of what devonthink 2.0 can do, and shows some things I would like to see work.  One major issue was fixed in pb2: the ability to  preserve the current text and record selection (retaining focus) when modifying a record in the same folder. A major feature making this possible was added in pb6: the ability to add search strings to links (yeah!!!) 

----I have put FF for "future feature" in places where I'd like to see devonthink pro's scripting changed to make this work a bit more seemlessly.  
---Many thanks to Christian Grunenberg for listening and including so many great features!

property growl_notify : false
	set last_date to date "Thursday, January 1, 1970 12:00:00 AM"
on error
	set last_date to 0
end try

tell application "DEVONthink Pro"
	set orig_win to window 1
	---collect important information
	set the_doc to content record
	set cur_doc_name to name of the_doc
	set cur_uuid to uuid of the_doc
	set cur_doc_url to URL of the_doc
	set target_group to current group
	set cur_root to root of current database
	set current_page to current page of orig_win
	---set noted_win to open window for record the_doc
	set the_sel to {selected text of orig_win}
	log first item of the_sel
		if first item of the_sel is "" then
			set the_sel to text returned of (display dialog "Please select some text or " default answer "enter your note here")
			set the_sel to first item of the_sel
		end if
	on error
		set the_sel to text returned of (display dialog "Please select some text or " default answer "enter your note here")
	end try
	--attempt to determine group of window
	set window_class to class of window 1
	---If the user has a "document window" open , we have no idea what the "natural" destination for incoming note-taking records would be. 
	--This routine tries to  try to figure out where it was opened from, so we know where to put the notes file. The best way I can think of is to look through the group where the notated file is replicated and choose the one that was most recently opened.  
	--future feature:  -it would be nice here if each document window had a property "opened from parent" so that we could just read the target group  information .  
	--if type of window 1 is document then set target_group to opened from parent of window 1 
	if window_class is document window then
		set noted_win to window 1
		set the_parents to every parent of the_doc
		repeat with this_parent in the_parents
			if opening date of this_parent > last_date then
				set latest_opened_parent to this_parent
				set last_date to opening date of this_parent
			end if
		end repeat
		--set parent_of_current to first parent of the_doc
		set target_group to latest_opened_parent
		---the other option is that the user is looking in a three pane view...		
	else if window_class is viewer window then
		set target_group to current group
		set cur_root to root of current database
		if target_group is cur_root then
			log "group is root"
			set target_group to root of current database
		end if
	end if
	--I would like to be able to find out the offset of the selected text in the current document (not just the text)
	--FF  set x to get properties of the_sel
	--FF  set x to get offset of the_sel 
	--FF set x to get span of the_sel ===> characters 50 through 100 of text of window 1
	--FF  set x to get page number  of the sel  
	--It would be nice if one could detect what kind of view the user has active in the current viewer window. 	
	---if view mode of the_win is Three-pane or if type of window 1 is document else display dialog "You should select text in either three-pane or document view in order to use this script." 
	--set ff to get view mode of the_win ===> [List,Icons, Three-Pane, Columns, Split]
	---see if there is a snippets note in target group, if not, create it.
	set group_name to name of target_group
	set notes_name to "/" & group_name & "-notes"
	--test if snippets_note exists
		set note_taking_rec to first child of target_group whose name is notes_name
		set the_rec to first child of target_group whose name is notes_name
	on error
		set note_taking_rec to {}
	end try
	if note_taking_rec is {} then
		set note_taking_rec to create record with {type:rtf, name:notes_name, rich text:(return & " " as styled text)} in target_group
		delay 4
		--Right now there is no way to embed a devonthink URL (target plus tag) into  a rich text variable or create a text container without having a window open.  
		--So open the window...and put stuff in the ugly way
		--FF: it would be nice to be able to embed URLS onto the clipboard at least;
		--FF: it would also be nice one could specify a window be opened invisibly, instead of opening it first and then making it invisible.
	else --snippets note exists
		log the_rec
		set current_text to rich text of note_taking_rec
		(*	if type of the_rec is "rich text" then
			set current_text to rich text of note_taking_rec
			set current_text to plain text of note_taking_rec
		end if
		log "current text of note " & current_text
	end if
	---set noted_win to open window for record the_doc
	set snippet_win to open window for record note_taking_rec ---with {visible:false}
	---the following code is updating the "note pad" note and trying to put a link to the original (selected text).  
	--you must have wikilinks turned on in the Devonthink Control Panel for this to work.
		set current_page to current_page as string
		if current_page is not "-1" then
			set target_link to "x-devonthink-item://" & (cur_uuid as text) & "?page=" & (current_page as string) & "?search=" & my shorter_link(the_sel, 8)
			set target_link to "x-devonthink-item://" & (cur_uuid as text) & "?search=" & my shorter_link(the_sel, 8)
		end if
		--it would be nice if we could include both variables, page number and search value.  But right now DT does not supply current page number to applescript.
		tell text of snippet_win
			make new paragraph with data (return & "---------------" & return) at end
			make new paragraph with data (return & "link to " & cur_doc_name) at end
			set URL of last paragraph to target_link
			make new paragraph with data (return & "---------------" & return & the_sel & return) at end
		end tell
		if orig_win is not snippet_win then close snippet_win with saving
		if growl_notify then my growlNotification("DEVONthink Pro", "added snipped link ", "")
	on error errormsg
		if growl_notify then my growlNotification("DEVONthink Pro", "error in adding snip ", errormsg)
	end try
	if growl_notify then my growlNotification("DEVONthink Pro", "window closing..." & name of target_group, "")
	--optional code to replicate the note link to every record where the original record lives  (uncomment to make this work)
	if window_class is document window then
	set the_parents to every parent of the_doc
	if (count of the_parents) > 1 then
		set the_parents to items 2 through -1 of the_parents
		repeat with this_parent in the_parents
			set parent_name to name of this_parent
			log parent_name
			set test_rec to (every child of this_parent whose name is notes_name)
			if test_rec is {} then
				replicate record x to this_parent
			end if
		end repeat
	end if
	end if
	set text_to_add to the_sel as Unicode text
	---the following commands at present give no errors but do nothing.  I would like to see them work.  How nice it would be to restore the selection once it has been messed up by record creation/window opening, etc. 
	---set index of orig_window to 1
	my growlNotification("DEVONthink Pro", "Note taken to " & name of target_group, text_to_add)
	set snippet_win to open window for record note_taking_rec
	set index of snippet_win to 2
	set index of orig_win to 1
	hide progress indicator
end tell

on growlNotification(growlIcon, growlTitle, growlDescrip)
	if application "GrowlHelperApp" is running then
		set appName to "ericsdtnotify"
		set notifs to {growlTitle}
		tell application "GrowlHelperApp"
			register as application ¬
				appName all notifications notifs ¬
				default notifications notifs ¬
				icon of application growlIcon
			notify with name growlTitle title growlTitle description growlDescrip application name appName
		end tell
	end if
	return ""
end growlNotification

on shorter_link(the_text, max_words)
	---shortens link to contain the maximum number of words passed. 
	set shorter_text to ""
	set cr to "\n"
	if ((count words of the_text) is greater than max_words) then
		set all_words to (words 1 through max_words of the_text)
		repeat with the_word in all_words
			set shorter_text to shorter_text & the_word & " "
		end repeat
		set shorter_text to (characters 1 through ((length of shorter_text) - 1) of shorter_text) as text
		log "shortened to " & shorter_text
		set shorter_text to the_text
	end if
	--strip character returns
	set offs to offset of cr in shorter_text
	if offs is not 0 then set shorter_text to (characters 1 through offs of shorter_text) as text
	return shorter_text
end shorter_link

I was playing around with this problem this weekend. My solution was to create a new smart template based on “Annotation.” All I did was change the rtf. document for that template to look something like the template “quote (from clipboard).” This is what the template included:

Document %documentLink%
page #

I gave the template a keyboard shortcut, so to take a clipping I select the text, copy it, then invoke this template. The result is a note in an annotations folder with the quote and a link back to the original document. It also changes the ulr of the original document to link to this quote. After I finish reading a given source I would probably move all the notes out of the annotation folder to their own group. It’s not perfect, but it seems like the smart template has a lot of potential for doing what we all seem to be trying to accomplish.



erico - this looks really useful. It seems a bit slow, but I don’t know whether that’s the script or my soon-to-be-update 12" Powerbook. Appreciate the effort that’s gone into it, and hope we get access to any updates.


Are you running it in the script editor or as a hotkey? It’s pretty fast for me if I use a hotkey, but I use a macbook (not pro), first generation. It’s my sense that Applescript is a lot faster on intel, and (if I recall) Leopard sped things up quite a bit. There’s also a few programmed delays (the ‘wait’ commands) that can be adjusted a bit if you have a slower machine.

And of course, I’ll post updates here, especially if the DT team finds a way to get around the two difficulties named above. I think Christian has indicated that the “rich text windows must be open” problem is insurmountable; but perhaps it could be covered up better (worked around) if document windows could be opened as invisible. I’m still hoping that post-2.0, a scripting feature for finding the opening parent for a document window might be added.

I’ve also toyed with the idea of rewriting this one in python. That might not only be faster, but could conceivably allow for more manipulation of the RTF layer to take place. The template solution posted above, is, however, also a real issue. If only there were a easy way to merge all those individual note RTFs and not lose the formatting. Hmmm…I need to think about that. Meanwhile, I’m glad when anybody finds a script useful!


I’m new to using scripts and grabbing the highlighted parts of my text could be pretty handy for me. Can anybody direct me to a topic or help page that explains me how to implement the above mentioned script?



Thanks for your script.
I have ever done a fork of the Annotation template to be in Plain text Markdown instead of RTF.
There is nice feature (provide by DT) in it as link between the element file and the annotation file, rerun do’ent create a new by etc.
It will be terrible if we could improve this template using your script to populate with content selected and link to a specific. It will be the best of both world with the ease of use of templating, linkback, append if exist etc… of the template and your nice populating fonction.
But as the two script have many function with the same purpose, and the two have different naming for variables or function, It is too difficult for me to find the way to reuse your populating function in the Annotation template.
It doesn’t seems that difficult (I could be wrong on this?) to achieve but I can’t make it work. I tried to ditch everything that doesn’t seems useful for the populating (Growl things etc.) and to understand the naming of variables.
But I can’t find the way to make it work. I am not used to Applescript or coding enough to really understand everything.

As anybody done such a mix based on the Annotation template? (170 KB)