Extracting highlighted text is a key aspect of my knowledge workflow. Up until this point, I have been using Zotero and Zotfile to accomplish this, but Zotero’s focus is on citation management not knowledge management has restricted its usefulness. I’ve purchased DT3 with the intention of migrating my workflow to it. The problem is summarize highlights produces a format that I’m not comfortable with and one that does not lend itself to easily exporting to other tools like OmniOutliner.
There have been several discussion threads on this topic without a satisfactory resolution. The best recommendation was to create an AppleScript to parse attribute runs. I spent several hours trying to write a script, but I’ve been unsuccessful. While I have coding skills, I’m largely unfamiliar with AppleScript and there is very little support on the wider web on how to use Text Suite to accomplish the extraction and formatting of RTF. Most importantly, I feel like any script I write now, will need to be continually upgraded as the format of summarize highlights changes in future releases. I’d suggest parsing this output is a hack at best. Is there a better way to accomplish this I’m missing?
Moving forward a template solution would be the best. Releasing a highlight AppleScript that we could modify to our own needs would also be a solution. Until one of these solutions emerge, is there something I can do now to re-format the output of Summarize Highlights?
I’d use Skim - it has great built-in templating for exporting notes, is free, and Skim annotations are supported in DT (Mac only but easy to convert to standard PDF annotations if you need them on iOS).
Cool, thanks for posting it. Would you mind pasting the code into the forum itself? When I try to grab the text from the PDF, the linebreaks get all messed up. Just put [code] before the code, and end with [/code].
--Script: Convert Summary
--Version: 1.0
--Author: Jason Virtue
tell application id "DNtp"
if selection is not {} then
repeat with thisRecord in (selection as list)
set rtfText to "" as styled text
set newFileName to filename of thisRecord as string
set newFileName to characters 1 thru ((count newFileName) - 12) of (newFileName as text)
set newFileName to newFileName & " Highlights"
set newFileName to (newFileName as string)
set newRecord to create record with {name:newFileName, type:rtf, rich text:rtfText} in current group
tell text of thisRecord
set pdfLink to ""
set linkText to ""
set highlight to ""
set pdfNote to ""
set pdfLinkBuffer to ""
set linkTextBuffer to ""
set highlightBuffer to ""
set pdfNoteBuffer to ""
set lastItemWasHighlight to false
set lastHighlight to ""
set isHighlight to false
repeat with parasOfText in attribute runs
set theBackground to ((background of parasOfText) as string)
set isHighlight to false
if theBackground is not "" then
set isHighlight to true
set highlight to highlight & my trimText(parasOfText as string)
else if exists URL of parasOfText then
set pdfLink to my trimText(URL of parasOfText as string)
set linkText to my trimText(parasOfText as string)
else
set pdfNoteString to my trimText(parasOfText as string)
if pdfNoteString is not "" then
set pdfNote to pdfNoteString
else
--hack
set isHighlight to true
end if
end if
tell text of newRecord
--make new paragraph at end with data return & theBackground & return
end tell
if lastItemWasHighlight is true and isHighlight is false then
if highlight is not "" then
if pdfNote is "JOIN" then
set highlight to highlightBuffer & " " & highlight
set pdfNote to ""
else
if highlightBuffer is not "" then
my outputHighlight(newRecord, pdfNoteBuffer, highlightBuffer, linkTextBuffer, pdfLinkBuffer)
end if
set pdfLinkBuffer to pdfLink
set linkTextBuffer to linkText
set highlightBuffer to highlight
set pdfNoteBuffer to pdfNote
set highlight to ""
set pdfNote to ""
end if
end if
end if
if isHighlight then
set lastItemWasHighlight to true
else
set lastItemWasHighlight to false
end if
end repeat
end tell
if highlightBuffer is not "" then
my outputHighlight(newRecord, pdfNoteBuffer, highlightBuffer, linkTextBuffer, pdfLinkBuffer)
end if
if highlight is not "" then
my outputHighlight(newRecord, pdfNote, highlight, linkText, pdfLink)
end if
tell text of newRecord
set its size to 12
end tell
end repeat
end if
end tell
on trimText(textToTrim)
set ret to ""
tell application id "DNtp"
set wordCount to count words of (textToTrim as string)
if wordCount = 0 then
set ret to ""
else
set ret to texts from first word to last word of (textToTrim as text)
end if
end tell
return ret
end trimText
on outputHighlight(newRecord, pdfNote, highlight, linkText, pdfLink)
tell application id "DNtp"
tell text of newRecord
set charCount to 0
if pdfNote is not "" then
make new paragraph at end with data pdfNote
set charCount to (-1 * (count pdfNote))
tell characters (charCount) thru -1 to set its font to "Helvetica Neue Bold"
end if
make new paragraph at end with data ": "
if highlight is not "" then
make new paragraph at end with data highlight
set charCount to (-1 * (count highlight))
tell characters (charCount) thru -1 to set its font to "Helvetica Neue"
end if
if pdfLink is not "" then
set linkTest to " ((" & linkText & "))"
make new paragraph at end with data linkTest
set charCount to (-1 * (count linkTest))
set URL of characters (charCount + 3) thru (-1 - 2) to pdfLink
--set the URL of characters -4 thru -1 to "http://www.google.com"
set font of characters (charCount) thru -1 to "Helvetica Neue"
end if
make new paragraph at end with data return
end tell
end tell
end outputHighlight
Hmm I just get an error every time at the repeat with parasOfText … line
error "DEVONthink 3 got an error: Can’t make every attribute run of every text of content id 10672 of database id 2 into type string." number -1700 from every attribute run of every text of content id 10672 of database id 2 to string
this doesn’t work either:
tell application id "DNtp"
set t to item 1 of (selection as list)
tell text of t
attribute runs
end tell
end tell
trying it on files with native PDF annotations that are viewable in DT, Preview and Highlights
You have to run it on the RTF that “Summarize Highlights” produces.
I haven’t as of yet found a way to link the whole process together in a seamless way. I remember from another thread that 3.0.2 will correct an error with smart rules that will permit this.
Hmm, I’m finding that the script is not assigning line numbers correctly in a couple ways:
The first link is not being used for the first highlight is using the link for the second highlight, and then the last hightlight just repeats the second to last link.
See how the first highlight “the natural world…” doesn’t have the link to line 11, but line 14.
Also if you create one highlight summary from multiple documents (shift select the docs, then run “summarize highlights”), and run the script on that, the last highlight for each document gets moved to the document that follows it:
To fix that glitch I encountered, I took a stab at revising your script. I ended up streamlining how it ran as well. This code is not as robust as yours, but since the summary highlight docs have such a consistent formatting, I could make some assumptions about what goes where.
This will work with summaries made from single files, or multiple files.
--Script: Convert Formatting of Highlight Summary
--Author: Jason Virtue, with revisions by Daniel Sroka
--https://discourse.devontechnologies.com/t/customize-the-output-of-summarize-highlights/51341
tell application id "DNtp"
if selection is {} then
display dialog ("First select a document created by the command
Tools > Summarize Highlights
then rerun this script") buttons {"Ok"}
return
else
repeat with thisRecord in (selection as list)
--make new summary document
set rtfText to "" as styled text
set newFileName to filename of thisRecord as string
set newFileName to characters 1 thru ((count newFileName) - 12) of (newFileName as text)
set newFileName to newFileName & " Highlights"
set newFileName to (newFileName as string)
set newRecord to create record with {name:newFileName, type:rtf, rich text:rtfText} in current group
tell text of newRecord
make new paragraph at end with data newFileName & return
tell last paragraph
set its font to "Georgia Bold"
end tell
end tell
--process existing summary document
tell text of thisRecord
repeat with thisRun in attribute runs
--determine what kind of run this is: headling, highlight, link, or blank
set thisRunType to ""
if font of thisRun contains "Bold" then set thisRunType to "heading"
set theBackground to ((background of thisRun) as string)
if theBackground is not "" then set thisRunType to "highlight"
if exists URL of thisRun then set thisRunType to "link"
set thisText to my trimText(thisRun as string)
if thisRunType = "heading" then
tell text of newRecord
make new paragraph at end with data return
make new paragraph at end with data thisText
tell last paragraph
set properties to {font:"Georgia Bold", size:11}
end tell
make new paragraph at end with data return
end tell
else if thisRunType = "highlight" then
tell text of newRecord
make new paragraph at end with data thisText & thisLinkText & return
set URL of characters (0 - thisLinkTextSize) thru -1 to thisLink
end tell
else if thisRunType = "link" then
set thisLink to my trimText(URL of thisRun as string)
set thisLinkText to characters 6 thru -1 of thisText
set thisLinkTextSize to (count of thisLinkText) + 1
set thisLinkText to " - " & thisLinkText
--this is not added to new doc yet, it is saved for the next highlight
else
--if blank, do nothing
end if
end repeat
end tell
--final cleanup of text styles
tell text of newRecord
set its size to 11
set size of paragraph 1 to 14
set (font of every attribute run whose (font contains "Bold")) to "Georgia Bold"
set (font of every attribute run whose (font does not contain "Bold" and font does not contain "Italic")) to "Georgia"
set {alignment, line spacing, paragraph spacing, minimum line height, maximum line height} to {left, 4, 8, 0, 0, 0}
end tell
end repeat
end if
end tell
on trimText(textToTrim)
set ret to ""
tell application id "DNtp"
set wordCount to count words of (textToTrim as string)
if wordCount = 0 then
set ret to ""
else
set ret to texts from first word to last word of (textToTrim as text)
end if
end tell
return ret
end trimText
Thanks for highlighting some bugs, I’ll update my code when I have sometime this week.
Ultimately, I’d like to avoid the parsing by having DEVONThink implement an AppleScript method that outputs an array of header/highlight/page pairs that a script can then process. Hopefully the developers would be open to exposing this functionality.
I’ll post my updated script once I debug it further.
I know this is a stupid question, but I’m not easily finding an answer. How do I run this script? What are the steps to put this into DT and get it working? I haven’t added scripts before.
Since I uploaded my script last time, I completely rewrote it in java for better extraction. Attached is a zip with the files.
Extract Annotations - AppleScript goes into devonthink scripts dir
extractPDFAnnotations - shell script that calls java goes into /usr/local/bin
extractHighlights.jar - java file that goes into /usr/local/bin
You can customise in output and parameters in the AppleScript file.
I can’t find the usr/local/bin folder. I’ve tried CMD + SHIFT + G and I’ve tried the terminal command. Doesn’t find anything.
Also, the devonthink scripts folder I presume is the one that you access through “Open Scripts Folder” command which has four folders in it (Reminders, Smart Rules, Toolbar, Menu)?
Ok, yeah, permission is denied. I’ve tried a sudo command to bypass it and gain permission but nothing is working.
Edit: Apparently from Catalina onwards Apple has implemented a “System Integrity Protection” which, by my understanding, makes all this unnecessarily difficult. Apparently you can disable SIP but I don’t think that’s recommended.
The location of the shell script really doesn’t matter in the end, you’ll just need to change the path of the shell script at the top of the AppleScript file.