Customize the Output of Summarize Highlights

jrv · October 26, 2019, 10:26pm

Extracting highlighted text is a key aspect of my knowledge workflow. Up until this point, I have been using Zotero and Zotfile to accomplish this, but Zotero’s focus is on citation management not knowledge management has restricted its usefulness. I’ve purchased DT3 with the intention of migrating my workflow to it. The problem is summarize highlights produces a format that I’m not comfortable with and one that does not lend itself to easily exporting to other tools like OmniOutliner.

There have been several discussion threads on this topic without a satisfactory resolution. The best recommendation was to create an AppleScript to parse attribute runs. I spent several hours trying to write a script, but I’ve been unsuccessful. While I have coding skills, I’m largely unfamiliar with AppleScript and there is very little support on the wider web on how to use Text Suite to accomplish the extraction and formatting of RTF. Most importantly, I feel like any script I write now, will need to be continually upgraded as the format of summarize highlights changes in future releases. I’d suggest parsing this output is a hack at best. Is there a better way to accomplish this I’m missing?

Moving forward a template solution would be the best. Releasing a highlight AppleScript that we could modify to our own needs would also be a solution. Until one of these solutions emerge, is there something I can do now to re-format the output of Summarize Highlights?

Thanks in advance,
Jason

jongilizwe · October 27, 2019, 5:52am

I’d use Skim - it has great built-in templating for exporting notes, is free, and Skim annotations are supported in DT (Mac only but easy to convert to standard PDF annotations if you need them on iOS).

jrv · October 28, 2019, 10:56pm

Thank you for your suggestion.

I agree Skim would work, but it doesn’t help to integrate my workflow in DT. I can quickly open the PDF in Zotero and get the exact format I require.

Any other scripting suggestions within DT?

Thanks,
Jason

jrv · November 1, 2019, 8:57am

I threw a day at this project and ended up getting my script to work.

As a note, the script supports joining highlights across pages by creating a highlight note with the text “JOIN”

Hope this script starts a conversation towards adding native functionality.

Jason Virtue

Convert Summary.pdf (50.8 KB)

dansroka · November 1, 2019, 12:37pm

Cool, thanks for posting it. Would you mind pasting the code into the forum itself? When I try to grab the text from the PDF, the linebreaks get all messed up. Just put [code] before the code, and end with [/code].

jrv · November 1, 2019, 10:04pm

Thanks for pointing out the code command:


--Script: Convert Summary
--Version: 1.0
--Author: Jason Virtue

tell application id "DNtp"
	if selection is not {} then
		
		repeat with thisRecord in (selection as list)
			
			set rtfText to "" as styled text
			
			
			set newFileName to filename of thisRecord as string
			set newFileName to characters 1 thru ((count newFileName) - 12) of (newFileName as text)
			set newFileName to newFileName & " Highlights"
			set newFileName to (newFileName as string)
			
			
			
			set newRecord to create record with {name:newFileName, type:rtf, rich text:rtfText} in current group
			
			
			tell text of thisRecord
				set pdfLink to ""
				set linkText to ""
				set highlight to ""
				set pdfNote to ""
				
				set pdfLinkBuffer to ""
				set linkTextBuffer to ""
				set highlightBuffer to ""
				set pdfNoteBuffer to ""
				
				set lastItemWasHighlight to false
				set lastHighlight to ""
				set isHighlight to false
				repeat with parasOfText in attribute runs
					
					set theBackground to ((background of parasOfText) as string)
					set isHighlight to false
					
					
					if theBackground is not "" then
						set isHighlight to true
						set highlight to highlight & my trimText(parasOfText as string)
					else if exists URL of parasOfText then
						set pdfLink to my trimText(URL of parasOfText as string)
						set linkText to my trimText(parasOfText as string)
					else
						set pdfNoteString to my trimText(parasOfText as string)
						if pdfNoteString is not "" then
							set pdfNote to pdfNoteString
						else
							--hack
							set isHighlight to true
						end if
					end if
					
					tell text of newRecord
						--make new paragraph at end with data return & theBackground & return
					end tell
					
					if lastItemWasHighlight is true and isHighlight is false then
						if highlight is not "" then
							
							if pdfNote is "JOIN" then
								set highlight to highlightBuffer & " " & highlight
								set pdfNote to ""
							else
								if highlightBuffer is not "" then
									my outputHighlight(newRecord, pdfNoteBuffer, highlightBuffer, linkTextBuffer, pdfLinkBuffer)
								end if
								set pdfLinkBuffer to pdfLink
								set linkTextBuffer to linkText
								set highlightBuffer to highlight
								set pdfNoteBuffer to pdfNote
								set highlight to ""
								set pdfNote to ""
							end if
							
						end if
						
						
					end if
					
					if isHighlight then
						set lastItemWasHighlight to true
					else
						set lastItemWasHighlight to false
					end if
					
					
				end repeat
			end tell
			
			
			if highlightBuffer is not "" then
				my outputHighlight(newRecord, pdfNoteBuffer, highlightBuffer, linkTextBuffer, pdfLinkBuffer)
			end if
			
			if highlight is not "" then
				my outputHighlight(newRecord, pdfNote, highlight, linkText, pdfLink)
			end if
			
			
			tell text of newRecord
				set its size to 12
			end tell
			
		end repeat
		
		
		
	end if
end tell

on trimText(textToTrim)
	set ret to ""
	tell application id "DNtp"
		set wordCount to count words of (textToTrim as string)
		if wordCount = 0 then
			set ret to ""
		else
			set ret to texts from first word to last word of (textToTrim as text)
		end if
	end tell
	return ret
end trimText

on outputHighlight(newRecord, pdfNote, highlight, linkText, pdfLink)
	tell application id "DNtp"
		tell text of newRecord
			
			set charCount to 0
			
			if pdfNote is not "" then
				make new paragraph at end with data pdfNote
				set charCount to (-1 * (count pdfNote))
				tell characters (charCount) thru -1 to set its font to "Helvetica Neue Bold"
			end if
			
			make new paragraph at end with data ": "
			
			if highlight is not "" then
				make new paragraph at end with data highlight
				set charCount to (-1 * (count highlight))
				tell characters (charCount) thru -1 to set its font to "Helvetica Neue"
			end if
			
			
			
			if pdfLink is not "" then
				set linkTest to " ((" & linkText & "))"
				make new paragraph at end with data linkTest
				set charCount to (-1 * (count linkTest))
				set URL of characters (charCount + 3) thru (-1 - 2) to pdfLink
				--set the URL of characters -4 thru -1 to "http://www.google.com"
				set font of characters (charCount) thru -1 to "Helvetica Neue"
			end if
			
			make new paragraph at end with data return
		end tell
		
		
	end tell
end outputHighlight

jongilizwe · November 2, 2019, 4:52am

Hmm I just get an error every time at the repeat with parasOfText … line

error "DEVONthink 3 got an error: Can’t make every attribute run of every text of content id 10672 of database id 2 into type string." number -1700 from every attribute run of every text of content id 10672 of database id 2 to string

this doesn’t work either:

tell application id "DNtp"
set t to item 1 of (selection as list)
tell text of t
	attribute runs
end tell
end tell

trying it on files with native PDF annotations that are viewable in DT, Preview and Highlights

jrv · November 3, 2019, 12:32am

You have to run it on the RTF that “Summarize Highlights” produces.

I haven’t as of yet found a way to link the whole process together in a seamless way. I remember from another thread that 3.0.2 will correct an error with smart rules that will permit this.

Jason

dansroka · November 3, 2019, 2:15pm

Oh very cool. Nice solution!

I especially like how it removed the highlighting and moves the line number to the end of each highlight.

And ah, so that’s how you make a new RTF file with Applescript!

Good work!

dansroka · November 3, 2019, 2:59pm

Hmm, I’m finding that the script is not assigning line numbers correctly in a couple ways:

The first link is not being used for the first highlight is using the link for the second highlight, and then the last hightlight just repeats the second to last link.

See how the first highlight “the natural world…” doesn’t have the link to line 11, but line 14.

Also if you create one highlight summary from multiple documents (shift select the docs, then run “summarize highlights”), and run the script on that, the last highlight for each document gets moved to the document that follows it:

See how line 17 (“we find nature to be…”) is appearing not at the end of the document “Emerson 1” but at the beginning of the document “Emerson 2”.

dansroka · November 3, 2019, 4:20pm

To fix that glitch I encountered, I took a stab at revising your script. I ended up streamlining how it ran as well. This code is not as robust as yours, but since the summary highlight docs have such a consistent formatting, I could make some assumptions about what goes where.

This will work with summaries made from single files, or multiple files.

--Script: Convert Formatting of Highlight Summary
--Author: Jason Virtue, with revisions by Daniel Sroka
--https://discourse.devontechnologies.com/t/customize-the-output-of-summarize-highlights/51341

tell application id "DNtp"
	
	if selection is {} then
		display dialog ("First select a document created by the command
Tools > Summarize Highlights
then rerun this script") buttons {"Ok"}
		return
	else
		
		repeat with thisRecord in (selection as list)
			
			--make new summary document
			set rtfText to "" as styled text
			set newFileName to filename of thisRecord as string
			set newFileName to characters 1 thru ((count newFileName) - 12) of (newFileName as text)
			set newFileName to newFileName & " Highlights"
			set newFileName to (newFileName as string)
			set newRecord to create record with {name:newFileName, type:rtf, rich text:rtfText} in current group
			tell text of newRecord
				make new paragraph at end with data newFileName & return
				tell last paragraph
					set its font to "Georgia Bold"
				end tell
			end tell
			
			--process existing summary document
			tell text of thisRecord
				repeat with thisRun in attribute runs
					
					--determine what kind of run this is: headling, highlight, link, or blank
					set thisRunType to ""
					if font of thisRun contains "Bold" then set thisRunType to "heading"
					set theBackground to ((background of thisRun) as string)
					if theBackground is not "" then set thisRunType to "highlight"
					if exists URL of thisRun then set thisRunType to "link"
					
					set thisText to my trimText(thisRun as string)
					
					if thisRunType = "heading" then
						tell text of newRecord
							make new paragraph at end with data return
							make new paragraph at end with data thisText
							tell last paragraph
								set properties to {font:"Georgia Bold", size:11}
							end tell
							make new paragraph at end with data return
						end tell
						
					else if thisRunType = "highlight" then
						tell text of newRecord
							make new paragraph at end with data thisText & thisLinkText & return
							set URL of characters (0 - thisLinkTextSize) thru -1 to thisLink
						end tell
						
					else if thisRunType = "link" then
						set thisLink to my trimText(URL of thisRun as string)
						set thisLinkText to characters 6 thru -1 of thisText
						set thisLinkTextSize to (count of thisLinkText) + 1
						set thisLinkText to "  - " & thisLinkText
						--this is not added to new doc yet, it is saved for the next highlight
					else
						--if blank, do nothing			
					end if
					
				end repeat
			end tell
			
			--final cleanup of text styles
			tell text of newRecord
				set its size to 11
				set size of paragraph 1 to 14
				set (font of every attribute run whose (font contains "Bold")) to "Georgia Bold"
				set (font of every attribute run whose (font does not contain "Bold" and font does not contain "Italic")) to "Georgia"
				set {alignment, line spacing, paragraph spacing, minimum line height, maximum line height} to {left, 4, 8, 0, 0, 0}
			end tell
			
		end repeat
	end if
end tell

on trimText(textToTrim)
	set ret to ""
	tell application id "DNtp"
		set wordCount to count words of (textToTrim as string)
		if wordCount = 0 then
			set ret to ""
		else
			set ret to texts from first word to last word of (textToTrim as text)
		end if
	end tell
	return ret
end trimText

jrv · November 3, 2019, 7:58pm

Thanks for highlighting some bugs, I’ll update my code when I have sometime this week.

Ultimately, I’d like to avoid the parsing by having DEVONThink implement an AppleScript method that outputs an array of header/highlight/page pairs that a script can then process. Hopefully the developers would be open to exposing this functionality.

I’ll post my updated script once I debug it further.

J

gwc · July 23, 2020, 3:11pm

Thanks a lot guys.

I know this is a stupid question, but I’m not easily finding an answer. How do I run this script? What are the steps to put this into DT and get it working? I haven’t added scripts before.

jrv · July 23, 2020, 4:05pm

Since I uploaded my script last time, I completely rewrote it in java for better extraction. Attached is a zip with the files.

Extract Annotations - AppleScript goes into devonthink scripts dir
extractPDFAnnotations - shell script that calls java goes into /usr/local/bin
extractHighlights.jar - java file that goes into /usr/local/bin

You can customise in output and parameters in the AppleScript file.

Hope it works,
JasonsummarizeHighlights.zip (8.2 MB)

gwc · July 24, 2020, 10:13am

Awesome. Thanks, Jason. I’ll give it a go!

Edit:

I can’t find the usr/local/bin folder. I’ve tried CMD + SHIFT + G and I’ve tried the terminal command. Doesn’t find anything.

Also, the devonthink scripts folder I presume is the one that you access through “Open Scripts Folder” command which has four folders in it (Reminders, Smart Rules, Toolbar, Menu)?

chrillek · July 24, 2020, 2:37pm

Re /usr/local/bin: you can just create it. As far as I gathered from the rest of this thread, you’ll also need Java to be installed.

The different script folders are described in DT’s documentation.

jrv · July 24, 2020, 6:21pm

You can create the /usr/local/bin with the terminal app if you don’t already have one: mkdir /usr/local then mkdir /usr/local/bin

You’ll need to install java from oracle:
https://www.oracle.com/java/technologies/javase/jdk-jre-macos-catalina.html

Install the AppleScript into the devonthink scripts folder. Easiest way is to use the script menu then open scripts folder.

J

sjk · July 24, 2020, 7:11pm

mkdir -p /usr/local/bin, which will fail if permission is insufficient.

gwc · July 25, 2020, 12:03am

Ok, yeah, permission is denied. I’ve tried a sudo command to bypass it and gain permission but nothing is working.

Edit: Apparently from Catalina onwards Apple has implemented a “System Integrity Protection” which, by my understanding, makes all this unnecessarily difficult. Apparently you can disable SIP but I don’t think that’s recommended.

Here’s a few details on the matter:

jrv · July 25, 2020, 12:51am

That’s annoying.

The location of the shell script really doesn’t matter in the end, you’ll just need to change the path of the shell script at the top of the AppleScript file.