Stream annotations from your PDF reading sessions with DEVONthink

ryanjamurphy · April 27, 2022, 6:30pm

I’ve just published instructions for setting up a perhaps-interesting new automation for your ~~procrastination~~ enjoyment.

In case you don’t want to download a .scpt file from my site, here’s the script in full:

property summaryNotesGroupUUID : "E940D2EB-5B4A-4D29-B64F-E585AA756826"
property readingSessionNotePrefix : "∎ " -- This is a prefix I use to indicate summary notes. If you don't want to use a prefix, switch it to ""
property readingSessionNoteSuffix : " - Reading Session "
property delayAfterReadingSessionInMinutes : 15
property debug : false -- debug flag. See immediately below.
property annotationFileYAML : "annotation-status: new" -- default YAML annotation fields and values. Separate each with "& return &" in order to make sure each field has its own line. E.g., `"annotation-status: new" & return & "some-other-field: someValue" will give you annotation-status and some-other-field on two different lines.
property newLineCharacter : "
" -- Using "return" does not work when trying to use findAndReplaceText and splitText on newlines. The script uses this variable to clean up the code. 

on run
	if debug then -- debug routine. Switch the debug : false above to true, and run the script manually with a test record selected to immediately test the script on the selected file. Note that this will fail to debug anything to do with delayAfterReadingSessionInMinutes.
		set previousDelayAfterReadingSessionInMinutes to delayAfterReadingSessionInMinutes
		set delayAfterReadingSessionInMinutes to 0
		tell application id "DNtp"
			set theSelection to get the selection
			my performSmartRule(theSelection) -- runs the performSmartRule function below.
		end tell
		set delayAfterReadingSessionInMinutes to previousDelayAfterReadingSessionInMinutes
	end if
end run

-- # Script info
-- Author: @ryanjamurphy
-- Created in April 2022
-- Requires DEVONthink 3 Pro (for the custom metadata feature)

-- ## Explanation
-- This script is designed to be executed by a DEVONthink Smart Rule configured to pick up an increase in the number of annotations on a recently-modified PDF.

-- When everything is all set up, this is how the workflow works:
-- You read a PDF, and make some highlights/strikethroughs/underlines. Fifteen minutes after you last modify the PDF (i.e., after you've "put the PDF down"), this script runs. When it runs, it extracts any of the newly-added annotations (note the assumptions below) into a uniquely-titled "Reading Session" markdown note (see "Generating a unique note title" below). 
-- ## Generating a unique note title
--  Notes will be titled with the syntax `readingSessionNotePrefix (configured as a property above) recordName readingSessionNoteSuffix (again, configured above) date+timestamp`. Like this: `∎ The Recurse Center User’s Manual - Recurse Center - Reading Session 202204271206`

-- ## Set up
-- Before you use the script, you have to configure one thing yourself, and you can modify the prefix and suffix used to distinguish reading notes from other notes.

-- ### You _must_ configure the following
-- #### Where to save reading session notes
-- **Summary note group (summaryNotesGroupUUID):** Notes are saved to a specific DEVONthink group. Choose the group you want the notes to end up in, right-click it, and select "Copy Item Link." Paste that copied value above, for the property `summaryNotesGroupUUID`, again deleting the `x-devonthink-item://` part at the front of the link. 

-- #### Optional configurations
-- **Reading Session note prefix (readingSessionNotePrefix):** I like to have a reliable, visible prefix in summary note filenames, so I don't get them confused with other notes and so that I can easily find them (or avoid them) when using "quick open"/"quick switcher"-type features. Switch this property to whatever you want. If you don't want a prefix at all, make the property "".
-- **Reading Session note suffix (readingSessionNoteSuffix):** Similar to the above, this property appends the specified text to every reading note filename, _before_ the date+timestamp. Change it to whatever you want or make the property "" to avoid using a suffix at all (except for the date+timestamp, which is necessary).
-- **The time to wait after annotating a reading before extracting highlights (delayAfterReadingSessionInMinutes):** How many minutes' delay you want between when you've annotated a reading and when the annotations are extracted into a new Reading Session note. I recommend that this be at least 10 minutes and not more than an hour, unless you are careful about how you configure the Smart Rule that executes this script.
-- **Annotation file YAML (annotationFileYAML):** If you're using a markdown editor that leverages YAML (https://en.wikipedia.org/wiki/YAML), you may want certain fields automatically inserted in your Reading Session Notes. Modify this property to change this. Set it to just "" if you don't want to use YAML. (I use `annotation-status: new` to be able to query my reading session notes for newly added annotations in Obsidian (https://obsidian.md) with the Dataview plugin (https://blacksmithgu.github.io/obsidian-dataview/).)



on performSmartRule(theRecords)
	
	-- Calculate a date+timestamp to make sure the summary notes created by this script are unique.
	set {year:yr, month:mn, day:dy, hours:hr, minutes:mins} to (current date)
	set dateandtimestamp to "20" & my pad(yr as integer) & my pad(mn as integer) & my pad(dy as integer) & my pad(hr as integer) & my pad(mins as integer) -- Got this from https://macscripter.net/viewtopic.php?id=44567 as a quick and dirty way of getting a Zk-style timestamp. It didn't include the "20" in "2022" so I prepended it manually. That'll become a problem in 87 years or so...
	set datestamp to "20" & my pad(yr as integer) & my pad(mn as integer) & my pad(dy as integer)
	set timestamp to my pad(hr as integer) & ":" & my pad(mins as integer)
	
	tell application id "DNtp"
		set theInbox to inbox
		set summaryNotesGroup to get record with uuid summaryNotesGroupUUID
		
		repeat with eachRecord in theRecords
			
			-- First, the script makes sure it's been at least fifteen minutes since the note was modified. 
			-- This is because we do not want the script to execute every time we make any change at all. Instead, the goal is to extract the highlights for every "reading session." So, the 
			set recordModified to eachRecord's modification date
			set recordOpened to eachRecord's opening date
			set recordCreated to eachRecord's creation date
			set recordAdded to eachRecord's addition date
			if (((current date) - recordModified) > delayAfterReadingSessionInMinutes * minutes) and (((current date) - recordOpened) > delayAfterReadingSessionInMinutes * minutes) and (((current date) - recordCreated) > delayAfterReadingSessionInMinutes * minutes) and (((current date) - recordAdded) > delayAfterReadingSessionInMinutes * minutes) then -- If this is true, it has been `delayAfterReadingSessionInMinutes` since the file has changed, and we can therefore go ahead and create a Reading Session note.
				
				-- Before we get too excited, the script will make sure the file content has _actually_ changed. (We don't want to run the script and waste resources if the file's contents haven't actually been modified.)
				set currentFileSize to eachRecord's size
				set previousFileSize to get custom meta data for "Previous filesize" from eachRecord
				if previousFileSize is missing value then
					set previousFileSize to currentFileSize - 1
				end if
				if (currentFileSize ≠ previousFileSize) then -- the file content has changed
					
					-- Create the annotation note name.
					set annotationNoteName to readingSessionNotePrefix & (eachRecord's (name without extension)) & readingSessionNoteSuffix & dateandtimestamp
					
					-- The script uses DEVONthink's Summarize Highlights feature to extract annotations from the PDF.
					set highlightsSummary to summarize highlights of records eachRecord as list to markdown in incoming group -- The script creates this summary in the inbox, because it will be deleted if there are no new annotations. 
					
					if highlightsSummary is not missing value then -- Highlights were successfully summarized, now we have to clean the resulting syntax
						
						-- Get the text of the newly-created summary, then convert the list into an array of the new highlights using splitText.
						set highlightsSummaryText to plain text of highlightsSummary
						set highlightsArray to my splitText(highlightsSummaryText, (newLineCharacter & "* "))
						
						-- Make sure there are new annotations by comparing the current annotation count to the previous annotation count metadata.
						set newAnnotationsCount to eachRecord's annotation count
						set previousAnnotationCount to get custom meta data for "Previous annotation count" from eachRecord
						if previousAnnotationCount is missing value then
							set previousAnnotationCount to 0
						end if
						set numberOfNewAnnotations to newAnnotationsCount - previousAnnotationCount
						if (numberOfNewAnnotations > 0) then -- There are some new annotations.
							
							-- The script is now going to iterate through the array of annotations.
							set annotationIterator to 0
							-- Extract the first item in the array of annotations and get the title from it. This is the default title generated by DEVONthink's Summarize Highlights feature.
							set annotationFileOriginalHeader to the first item in highlightsArray
							set linesOfAnnotationFileHeader to my splitText(annotationFileOriginalHeader, newLineCharacter)
							
							-- Prepend a YAML header to the note, if using.
							if annotationFileYAML is not equal to "" then
								set annotationFileHeader to "---" & return & annotationFileYAML & return & "---" & return & return & the first item in linesOfAnnotationFileHeader
							else
								set annotationFileHeader to the first item in linesOfAnnotationFileHeader
							end if
							
							-- Initialize the reading session annotations with a subtitle.
							set newAnnotations to annotationFileHeader & return & "Reading session from [[" & datestamp & "]] at " & timestamp & return & return
							
							-- Iterate through the highlights, and extract any newly-added annotations to the newAnnotations array.
							repeat with eachAnnotation in highlightsArray
								if annotationIterator > previousAnnotationCount then
									set newAnnotations to newAnnotations & return & "* " & eachAnnotation -- Prepend each line with "* " in order to keep each line consistent for cleanup in a moment.
								end if
								set annotationIterator to annotationIterator + 1
							end repeat
							
							-- Clean up the extracted annotations using replaceText. DEVONthink by default uses Critic Markup (https://fletcher.github.io/MultiMarkdown-6/syntax/critic.html), which I don't want to use. DEVONthink also keeps the highlighting around highlights, which I want to remove and convert to quotes. (I can then later highlight these highlights, if I want.)  I also don't want to use asterisks for bulleted lines. If you want to remove these and/or add your own, do so while referencing an actual Summarize Highlights file so that you're sure it's replacing exactly and only what you want and adding exactly and only what you want!
							set newAnnotations to my replaceText(newAnnotations, "* {==", "- > ") -- Get rid of criticmarkup's highlights
							set newAnnotations to my replaceText(newAnnotations, newLineCharacter & "* ", newLineCharacter & "- ") -- Switch asterisk-based bulleted lists with hyphen-based bulleted lists
							set newAnnotations to my replaceText(newAnnotations, "==}" & newLineCharacter, "" & newLineCharacter) -- Get rid of criticmarkup's highlights
							set newAnnotations to my replaceText(newAnnotations, "\\", "") -- Get rid of backslashes, which DEVONthink inserts to escape things sometimes.
							set newAnnotations to my replaceText(newAnnotations, (")" & return & "- "), (")" & return & return & "- ")) -- ensure there's a blank line between headings and the next extracted highlight.
							
							-- "Save" the new annotations by setting our summary note's `plain text` to our newly-assembled and cleaned text.
							set plain text of highlightsSummary to newAnnotations
							
							-- Rename the summary note with our unique, date+timestamped name.
							set name of highlightsSummary to annotationNoteName
							
							-- Remember how the summary note was initially created in the inbox? Now the script will move it to the final destination you configured at the top of the script.
							move record highlightsSummary from incoming group to summaryNotesGroup
							
							-- Update the PDF's Previous Annotation Count metadata.
							add custom meta data newAnnotationsCount for "Previous Annotation Count" to eachRecord
							
							-- Update the PDF's Previous Filesize metadata.
							add custom meta data currentFileSize for "Previous Filesize" to eachRecord
							
							-- Post a notification to let you know what we've done.
							if numberOfNewAnnotations is equal to 1 then -- only one newly-extracted annotation
								set notificationMessage to (numberOfNewAnnotations as text) & " annotation has been automatically extracted from \"" & eachRecord's name without extension & "\"."
								
							else -- more than one newly-extracted annotation
								set notificationMessage to (numberOfNewAnnotations as text) & " annotations have been automatically extracted from \"" & eachRecord's name without extension & "\"."
							end if
							display notification notificationMessage
						else
							-- There are no new annotations, so (1) update the previous filesize metadata so the script doesn't trigger again and (2) delete the temporary Summarize Highlights note we created in the inbox.
							add custom meta data currentFileSize for "Previous filesize" to eachRecord
							delete record highlightsSummary
						end if
						
					end if
				end if
			end if
		end repeat
	end tell
end performSmartRule

on pad(v) -- got this from https://macscripter.net/viewtopic.php?id=44567
	return text -2 thru -1 of ((v + 100) as text)
end pad

-- Apple utility functions, from https://developer.apple.com/library/archive/documentation/LanguagesUtilities/Conceptual/MacAutomationScriptingGuide/ManipulateText.html
on splitText(theText, theDelimiter)
	set AppleScript's text item delimiters to theDelimiter
	set theTextItems to every text item of theText
	set AppleScript's text item delimiters to ""
	return theTextItems
end splitText

on replaceText(this_text, search_string, replacement_string)
	set prevTIDs to AppleScript's text item delimiters
	set AppleScript's text item delimiters to the search_string
	set the item_list to every text item of this_text
	set AppleScript's text item delimiters to the replacement_string
	set this_text to the item_list as string
	set AppleScript's text item delimiters to prevTIDs
	return this_text
end replaceText

cgrunenberg · April 28, 2022, 7:47am

Thanks for sharing this quite impressive script! I guess one assumption is that all annotations are added from the beginning to the end, e.g. inserting annotations before already existing annotations would probably cause some troubles?

ryanjamurphy · April 28, 2022, 9:15am

Absolutely. I can’t think of a way to pull annotations from the PDF or the summary file in the order they were added.

I suppose if Summarize Highlights could do that instead of in the order they appear, it would be possible, but I don’t think PDF annotations store any metadata that would facilitate that…

cgrunenberg · April 28, 2022, 9:18am

PDF annotations actually have a date (see Document > Annotations inspector). However, what’s the benefit of such an order? At least for me (and my way of working/thinking) the result would be a mess

ryanjamurphy · April 28, 2022, 11:10am

Yeah, I think I agree with you. Sequential probably makes the most sense in most cases. Maybe I’ll look into manual extraction via scripting to get access to that annotation metadata at some point, though!

cgrunenberg · April 28, 2022, 11:14am

The current approach has of course the advantage that it’s not limited to PDF documents and supports RTF & Markdown documents too.

mlevison · May 4, 2022, 4:13pm

I downloaded the rule (thanks), installed in the SmartRules folder. Configured it with the UUID of a brand new folder. Hooked it up to a SmartRule. I watched the SmartRule find Annotations to export. I even did a manual “Apply Rule” and nothing happens.

My gut feeling says the biggest risk is that UUID isn’t found, how do I debug?

FWIW my Smart Rule:
Extract_Annotations_to_Markdown_File

Eager in Ottawa
Mark

mlevison · May 4, 2022, 4:25pm

I forgot to add, I’m a recovering software developer. I know I will need to spin up a debugger, I even see a fragment of debug code in the Script. What I don’t know is how to feed the script a file in debugging mode. (I also know nothing about the Apple Script debugger.

Forgetful in Ottawa
Mark

ryanjamurphy · May 4, 2022, 4:58pm

hmm. If you select the smart rule in the DEVONthink sidebar, it should give you a list of the records it will target in the View pane. Like this:

Do you see a list of records?

mlevison · May 4, 2022, 5:34pm

…Yes this is the list of files, it should be trying to process.

Old fashioned Mark i.e. the programmer from 30yrs would just try to debug this from STDOUT/STDERR and just log the flow. (Applescript makes my head hurt - it makes ‘C’ look readable).

ryanjamurphy · May 4, 2022, 5:39pm

hmm. Do you mind enabling the columns “PDF annotations”, “Previous PDF annotations”, and “Previous Filesize”? The latter two are custom metadata. Then screenshot those.

mlevison · May 4, 2022, 5:56pm

Neat I didn’t realize that playing with Scripts added custom Metadata. I file that under evil plans for world takeover.

ryanjamurphy · May 4, 2022, 6:08pm

Interesting. It looks like the script did fire on those files, because it added values for “Previous Annotation Count” and “Previous Filesize” to each record.

Now, the question is, did it successfully create the Reading Session notes (and if so, where?) or did it fail on record creation somehow…?

Maybe use the following Smart Group to see if you can see the Reading Sessions?
This Week.dtSmartGroup.zip (470 Bytes)

It’s just this:

Screen Shot 2022-05-04 at 3.38.18 PM

PS: Sorry if others are getting a bunch of notifications on this thread, but I figure public troubleshooting might help someone else in the future. Don’t forget you can change the “Watching” status at the bottom of the thread to a different alert level!

mlevison · May 4, 2022, 6:40pm

Stunning discovery the annotations are getting created and appear in my inbox. Back to my guess that my folder UUID is strange. FWIW I would be ok if the new items were tagged and I relied on a smart rule to move them to a destination folder.

Also supplementary question - have you automated export from DT to Obsidian or anywhere else?

ryanjamurphy · May 4, 2022, 9:07pm

Copy and paste that property here?

Tagging and smart rule would be pretty trivial, too.

mlevison · May 4, 2022, 9:32pm

property summaryNotesGroupUUID : "6ABC0EFF-5096-4FC3-B2D9-1E409A9B1FFF"

ryanjamurphy · May 4, 2022, 9:41pm

Looks right. I imagine if you paste that after x-devonthink-item://, it’ll open the group you want?

I’ll draft a tag version next time I’m at my desk!

Medievalist · May 4, 2022, 9:49pm

No this is super. I am NOT a programmer, but I do use Applescript and I learn by looking at code written by people who know what they are doing.

ryanjamurphy · May 5, 2022, 12:21am

Now that I’ve looked at the code, I’m not sure what could be causing the “move to the target group” function to fail for you. That part of the script involves only four lines:

give the script your group’s UUID (property summaryNotesGroupUUID : "E940D2EB-5B4A-4D29-B64F-E585AA756826")
get the targeted group (set summaryNotesGroup to get record with uuid summaryNotesGroupUUID)
create the extracted highlights record in the inbox (set highlightsSummary to summarize highlights of records eachRecord as list to markdown in incoming group) — “in incoming group” indicates the global inbox
move the extracted highlights record to the targeted group (move record highlightsSummary from incoming group to summaryNotesGroup)

Alas. Maybe someone else will have the same trouble and we can triangulate.

In the meantime, add these lines…

-- Add a tag to be able to easily find summary notes later
set tags of highlightsSummary to (tags of highlightsSummary) & "extracted_highlights"

…below the line set name of highlightsSummary to annotationNoteName

and configure “extracted_highlights” to be whatever tag you desire. Then make your Smart Rule to move files with that tag from the Inbox to your desired group.

rfog · May 5, 2022, 7:45am

It makes obfuscacted C look readable to me as well.

AFAIK, there is no debugger or debug option for Apple Script/Shortcuts. They are “magic” and do not need any debugging. [Irony Mode]

[Sorry for the OT]