Extract Skim Notes from pdfs - for searching

This experimental script will extract Skim Notes from PDFs that have such notes. The script skips documents in a selection that are not PDFs, and skips PDFs that do not have Skim Notes. There is no synchronization: if the PDF is updated, the extracted notes file is not. The script depends on the ‘skimnotes’ executable that is found at the location indicated by ‘skimnotesExecLoc’ in the script. If you do not have Skim installed in /Applications you’ll need to update the path (say, to ‘~/Applications …’). Also modify the name of the notes file to your own liking.

The point of this script is to provide a workaround to the DTPO shortcoming that Skim annotations are not searchable. If you have note files generated by this script, then at least you can search them and then link to the underlying PDF. Very clumsy; but it works. Perhaps someday DTech will incorporate the Skim Framework [**Edit: in order to use that Framework to search and modify Skim notes (annotations)]

If you make useful modifications to this script, please post them. For example, the “Annotation” scripts could incorporate this technique. Or this technique could be used to label PDFs that have Skim notes.

-- get Skim Notes from PDFs that have these notes
-- creates an RTF file with the notes
-- Links the note file to the PDF
-- use at your own risk: loss of data is your risk
-- v1.1 20100504

set skimnotesExecLoc to "./Applications/Skim.app/Contents/SharedSupport/skimnotes"

tell application id "com.devon-technologies.thinkpro2"
	try
		set theseItems to the selection
		if theseItems is "" then error "Please make a selection..."
		repeat with thisItem in theseItems
			if type of thisItem is PDF document then
				set thisItemPath to path of thisItem
				set skimnotesCmd to skimnotesExecLoc & " get -format text " & quoted form of thisItemPath & " -"
				set thisNotes to do shell script skimnotesCmd
				if thisNotes is not "" then
					set thisURL to "x-devonthink-item://" & uuid of thisItem
					set thisName to "Skim Notes: " & name of thisItem
					create record with {name:thisName, rich text:thisNotes, type:rtf, URL:thisURL} in current group of current database
					set thisNotes to ""
				end if
			end if
		end repeat
	on error error_message number error_number
		if the error_number is not -128 then display alert "DEVONthink Pro" message error_message as warning
	end try
end tell

It’s already embedded but only used to display Skim annotations right now.

For those who are predisposed to indexing, in an indexed folder the skim note file is indexed and searchable. That’s how I’m using it now.

I should also say here, thanks for the script, korm. It works well for me too. I

Skim can be set to “Automatically save skim notes backups” in its preferences. It produces a file of the same name as the pdf as a .skim file and puts it in the same folder as the PDF.

Skim will save over the previous version on each close of the file, so the notes are always up to date.

Now, if DevonThink indexes that folder, the user can pick up and search the note. But if it is based on highlighted text, it will also pull up the PDF with the original text.

Create a SmartGroup that has the condition “Name” is “.skim”. This will pull out only the annotations, including notes and highlighted text from Skim.

Nice script. It’s a nice alternative to indexing Skim notes backups- though that method is much more convenient, I just can’t stand the awful font that Skim notes use. Anyone know how to change that, by the way?

In Skim v1.3.10 (and earlier releases in 2010) the note export font is improved.

It’s true that the font isn’t as bad as it was, but I still prefer my chosen formatting.

Anyway, I was wondering if there was a way to make this script save over old notes- ie, have the new note replace its earlier version?

Sure, there’s a “way” - just no will at this time :smiley: . Maybe I’ll have time come Spring - or someone else can take a hack at it.

As far as I can tell, these skim files are not indexed by DTP. Am I missing something?

.skim files in the same folder as the .pdf are indexed when that folder is indexed.

I don’t see how this happens.

Korm (or anyone else!),

I am wondering if you have changed your methods on how you do this recently? I just started using skim more heavily and want to have those annotations searchable in devonthink. I take it that sometime (recently?) Devonthink started to support allowing skim to deposit its [same name as PDF].skim file into the same directory as the original PDF whenever the “automatically save Skim notes backups” file is checked and a save is performed.

So that got me to thinking…first, i found that instead of importing that file into devonthink and having it be static, I could just index it into the same group as the PDF, and have the annotations come up whenever I search in devonthink. Doing this by hand rather than through a script seems to work…

And thus I’d be happy to update the script to index rather than import the file, and thus allow for automatic updating. But before I do that, I was curious if anyone knew for a fact that this was a bad idea? Or is it just one that hasn’t been tried (lately)?

Or maybe to ask the question another way: is there a better way to keep skim PDFs and annotations together now than there was when these posts were originally written? Does anyone have a happy history of evolved perfection? Just curious!

best,
Erico

I think the indexing change would be a good mod, @erico. If you do do it, thanks. Over here I’m considering indexing PDFs more frequently and putting them into Dropbox or Box for sync with the iPad. (So I can stop futzing with DTTG, which is “old” and creaky – but that’s another thread… :confused: )

This is admittedly a quick hack, but it seems to work for me. I’m seeing live updating of the skim annotations now. It’s a little irritating that one cannot rename the .skim annotations files in devonthink, but I think that devonthink names have since version 2.0 been coerced to correspond to file system names (and vice versa)…

I’ll test this more and update it here, but i didn’t realize until doing this that this is more or less exactly what I want out of skim+ devonthink (short of devonthink actually supporting an annotations window from the skim toolkit!)

Thanks Korm for your older script…it inspired me to make a somewhat different move that I think is only recently possible with devonthink.


-- Import .skim file for selected PDFs
-- This indexes the .skim file of the selected PDFs IF and only if the .skim file is in the same directory as the PDF
-- Skim files will automatically be placed in the same directory as the PDF being annotated if one turns on the preference in skim preferences to "Automatically save Skim Notes." 
--This script allows one to import a pdf into devonthink, open it in skim, save, and then keep an updating record of the "annotations" in the same folder as the original PDF.  Do not rename the annotations file, or this will cease to work. 
---Use at your own risk!!
---Written by Eric Oberle, Based on a script by Korm from Devonthink Forums


tell application id "com.devon-technologies.thinkpro2"
	try
		set theseItems to the selection
		if theseItems is "" then error "Please make a selection..."
		repeat with thisItem in theseItems
			if type of thisItem is PDF document then
				set parents_of_item to parents of thisItem
				set thisItemPath to path of thisItem
				set parsed_path to my parse_filename(thisItemPath)
				set skim_doc to (the_path of parsed_path) & "/" & (the_filename of parsed_path) & ".skim"
				repeat with this_parent in parents_of_item
					set skim_doc_record to indicate skim_doc to this_parent
					
					if skim_doc_record is not missing value then
						set thisURL to "x-devonthink-item://" & uuid of thisItem
						set URL of skim_doc_record to thisURL
						
						set comment of thisItem to comment of thisItem & "\nSkim annotation at : " & "x-devonthink-item://" & uuid of skim_doc_record
						set thisNotes to ""
					end if
				end repeat
			end if
		end repeat
	on error error_message number error_number
		if the error_number is not -128 then display alert "DEVONthink Pro" message error_message as warning
	end try
end tell


on replace_chars(this_text, search_string, replacement_string)
	set AppleScript's text item delimiters to the search_string
	set the item_list to every text item of this_text
	set AppleScript's text item delimiters to the replacement_string
	set this_text to the item_list as string
	set AppleScript's text item delimiters to ""
	return this_text
end replace_chars



on perl_replace(inputstring, targetstring, replacementstring)
	
	set inputstring to my replace_chars(inputstring, (ASCII character 194), "<br>")
	set inputstring to my replace_chars(inputstring, "|", "+vertical-bar+")
	set inputstring to my replace_chars(inputstring, "'", "&#8216;")
	set shellscript to "/usr/bin/perl -e '$rpl=q|" & replacementstring & "|;$trgt=q|" & targetstring & "|;$thisvar=q|" & inputstring & "|;$thisvar=~s|$trgt|" & replacementstring & "|gi; print $thisvar;'" ---log shellscript
	--	try
	set theResult to (do shell script shellscript)
	
	set theResult to my replace_chars(theResult, "+vertical-bar+", "|") as Unicode text
	set theResult to my replace_chars(theResult, "&#8216;", "'") as Unicode text
	(*	on error
		set theResult to inputstring
	end try *)
	return theResult
end perl_replace


on rev_string(the_string)
	set x to (reverse of every character in (the_string as Unicode text)) as Unicode text
	return x
end rev_string


on parse_filename(this_filename)
	--returns extension: the_filename:, the_path
	set reverse_filename to my rev_string(this_filename)
	set reverse_regex to "([^\\.]*)[\\.]{0,1}([^\\/]*)\\/{0,1}(.*)"
	
	set reverse_parse to my perl_replace(reverse_filename, reverse_regex, "$1\\n$2\\n$3") as Unicode text
	set return_values to every paragraph of reverse_parse
	set returned_extension to my rev_string(first item of return_values)
	set return_file_name to my rev_string(second item of return_values)
	set returned_path to my rev_string(third item of return_values)
	set parsed_path to {the_extension:returned_extension, the_filename:return_file_name, the_path:returned_path}
	
	return parsed_path
end parse_filename