Cool, thanks for posting it. Would you mind pasting the code into the forum itself? When I try to grab the text from the PDF, the linebreaks get all messed up. Just put [code]
before the code, and end with [/code]
.
Thanks for pointing out the code command:
--Script: Convert Summary
--Version: 1.0
--Author: Jason Virtue
tell application id "DNtp"
if selection is not {} then
repeat with thisRecord in (selection as list)
set rtfText to "" as styled text
set newFileName to filename of thisRecord as string
set newFileName to characters 1 thru ((count newFileName) - 12) of (newFileName as text)
set newFileName to newFileName & " Highlights"
set newFileName to (newFileName as string)
set newRecord to create record with {name:newFileName, type:rtf, rich text:rtfText} in current group
tell text of thisRecord
set pdfLink to ""
set linkText to ""
set highlight to ""
set pdfNote to ""
set pdfLinkBuffer to ""
set linkTextBuffer to ""
set highlightBuffer to ""
set pdfNoteBuffer to ""
set lastItemWasHighlight to false
set lastHighlight to ""
set isHighlight to false
repeat with parasOfText in attribute runs
set theBackground to ((background of parasOfText) as string)
set isHighlight to false
if theBackground is not "" then
set isHighlight to true
set highlight to highlight & my trimText(parasOfText as string)
else if exists URL of parasOfText then
set pdfLink to my trimText(URL of parasOfText as string)
set linkText to my trimText(parasOfText as string)
else
set pdfNoteString to my trimText(parasOfText as string)
if pdfNoteString is not "" then
set pdfNote to pdfNoteString
else
--hack
set isHighlight to true
end if
end if
tell text of newRecord
--make new paragraph at end with data return & theBackground & return
end tell
if lastItemWasHighlight is true and isHighlight is false then
if highlight is not "" then
if pdfNote is "JOIN" then
set highlight to highlightBuffer & " " & highlight
set pdfNote to ""
else
if highlightBuffer is not "" then
my outputHighlight(newRecord, pdfNoteBuffer, highlightBuffer, linkTextBuffer, pdfLinkBuffer)
end if
set pdfLinkBuffer to pdfLink
set linkTextBuffer to linkText
set highlightBuffer to highlight
set pdfNoteBuffer to pdfNote
set highlight to ""
set pdfNote to ""
end if
end if
end if
if isHighlight then
set lastItemWasHighlight to true
else
set lastItemWasHighlight to false
end if
end repeat
end tell
if highlightBuffer is not "" then
my outputHighlight(newRecord, pdfNoteBuffer, highlightBuffer, linkTextBuffer, pdfLinkBuffer)
end if
if highlight is not "" then
my outputHighlight(newRecord, pdfNote, highlight, linkText, pdfLink)
end if
tell text of newRecord
set its size to 12
end tell
end repeat
end if
end tell
on trimText(textToTrim)
set ret to ""
tell application id "DNtp"
set wordCount to count words of (textToTrim as string)
if wordCount = 0 then
set ret to ""
else
set ret to texts from first word to last word of (textToTrim as text)
end if
end tell
return ret
end trimText
on outputHighlight(newRecord, pdfNote, highlight, linkText, pdfLink)
tell application id "DNtp"
tell text of newRecord
set charCount to 0
if pdfNote is not "" then
make new paragraph at end with data pdfNote
set charCount to (-1 * (count pdfNote))
tell characters (charCount) thru -1 to set its font to "Helvetica Neue Bold"
end if
make new paragraph at end with data ": "
if highlight is not "" then
make new paragraph at end with data highlight
set charCount to (-1 * (count highlight))
tell characters (charCount) thru -1 to set its font to "Helvetica Neue"
end if
if pdfLink is not "" then
set linkTest to " ((" & linkText & "))"
make new paragraph at end with data linkTest
set charCount to (-1 * (count linkTest))
set URL of characters (charCount + 3) thru (-1 - 2) to pdfLink
--set the URL of characters -4 thru -1 to "http://www.google.com"
set font of characters (charCount) thru -1 to "Helvetica Neue"
end if
make new paragraph at end with data return
end tell
end tell
end outputHighlight
Hmm I just get an error every time at the repeat with parasOfText … line
error "DEVONthink 3 got an error: Can’t make every attribute run of every text of content id 10672 of database id 2 into type string." number -1700 from every attribute run of every text of content id 10672 of database id 2 to string
this doesn’t work either:
tell application id "DNtp"
set t to item 1 of (selection as list)
tell text of t
attribute runs
end tell
end tell
trying it on files with native PDF annotations that are viewable in DT, Preview and Highlights
You have to run it on the RTF that “Summarize Highlights” produces.
I haven’t as of yet found a way to link the whole process together in a seamless way. I remember from another thread that 3.0.2 will correct an error with smart rules that will permit this.
Jason
Oh very cool. Nice solution!
I especially like how it removed the highlighting and moves the line number to the end of each highlight.
And ah, so that’s how you make a new RTF file with Applescript!
Good work!
Hmm, I’m finding that the script is not assigning line numbers correctly in a couple ways:
The first link is not being used for the first highlight is using the link for the second highlight, and then the last hightlight just repeats the second to last link.
See how the first highlight “the natural world…” doesn’t have the link to line 11, but line 14.
Also if you create one highlight summary from multiple documents (shift select the docs, then run “summarize highlights”), and run the script on that, the last highlight for each document gets moved to the document that follows it:
See how line 17 (“we find nature to be…”) is appearing not at the end of the document “Emerson 1” but at the beginning of the document “Emerson 2”.
To fix that glitch I encountered, I took a stab at revising your script. I ended up streamlining how it ran as well. This code is not as robust as yours, but since the summary highlight docs have such a consistent formatting, I could make some assumptions about what goes where.
This will work with summaries made from single files, or multiple files.
--Script: Convert Formatting of Highlight Summary
--Author: Jason Virtue, with revisions by Daniel Sroka
--https://discourse.devontechnologies.com/t/customize-the-output-of-summarize-highlights/51341
tell application id "DNtp"
if selection is {} then
display dialog ("First select a document created by the command
Tools > Summarize Highlights
then rerun this script") buttons {"Ok"}
return
else
repeat with thisRecord in (selection as list)
--make new summary document
set rtfText to "" as styled text
set newFileName to filename of thisRecord as string
set newFileName to characters 1 thru ((count newFileName) - 12) of (newFileName as text)
set newFileName to newFileName & " Highlights"
set newFileName to (newFileName as string)
set newRecord to create record with {name:newFileName, type:rtf, rich text:rtfText} in current group
tell text of newRecord
make new paragraph at end with data newFileName & return
tell last paragraph
set its font to "Georgia Bold"
end tell
end tell
--process existing summary document
tell text of thisRecord
repeat with thisRun in attribute runs
--determine what kind of run this is: headling, highlight, link, or blank
set thisRunType to ""
if font of thisRun contains "Bold" then set thisRunType to "heading"
set theBackground to ((background of thisRun) as string)
if theBackground is not "" then set thisRunType to "highlight"
if exists URL of thisRun then set thisRunType to "link"
set thisText to my trimText(thisRun as string)
if thisRunType = "heading" then
tell text of newRecord
make new paragraph at end with data return
make new paragraph at end with data thisText
tell last paragraph
set properties to {font:"Georgia Bold", size:11}
end tell
make new paragraph at end with data return
end tell
else if thisRunType = "highlight" then
tell text of newRecord
make new paragraph at end with data thisText & thisLinkText & return
set URL of characters (0 - thisLinkTextSize) thru -1 to thisLink
end tell
else if thisRunType = "link" then
set thisLink to my trimText(URL of thisRun as string)
set thisLinkText to characters 6 thru -1 of thisText
set thisLinkTextSize to (count of thisLinkText) + 1
set thisLinkText to " - " & thisLinkText
--this is not added to new doc yet, it is saved for the next highlight
else
--if blank, do nothing
end if
end repeat
end tell
--final cleanup of text styles
tell text of newRecord
set its size to 11
set size of paragraph 1 to 14
set (font of every attribute run whose (font contains "Bold")) to "Georgia Bold"
set (font of every attribute run whose (font does not contain "Bold" and font does not contain "Italic")) to "Georgia"
set {alignment, line spacing, paragraph spacing, minimum line height, maximum line height} to {left, 4, 8, 0, 0, 0}
end tell
end repeat
end if
end tell
on trimText(textToTrim)
set ret to ""
tell application id "DNtp"
set wordCount to count words of (textToTrim as string)
if wordCount = 0 then
set ret to ""
else
set ret to texts from first word to last word of (textToTrim as text)
end if
end tell
return ret
end trimText
Thanks for highlighting some bugs, I’ll update my code when I have sometime this week.
Ultimately, I’d like to avoid the parsing by having DEVONThink implement an AppleScript method that outputs an array of header/highlight/page pairs that a script can then process. Hopefully the developers would be open to exposing this functionality.
I’ll post my updated script once I debug it further.
J
Thanks a lot guys.
I know this is a stupid question, but I’m not easily finding an answer. How do I run this script? What are the steps to put this into DT and get it working? I haven’t added scripts before.
Since I uploaded my script last time, I completely rewrote it in java for better extraction. Attached is a zip with the files.
Extract Annotations - AppleScript goes into devonthink scripts dir
extractPDFAnnotations - shell script that calls java goes into /usr/local/bin
extractHighlights.jar - java file that goes into /usr/local/bin
You can customise in output and parameters in the AppleScript file.
Hope it works,
JasonsummarizeHighlights.zip (8.2 MB)
Awesome. Thanks, Jason. I’ll give it a go!
Edit:
I can’t find the usr/local/bin folder. I’ve tried CMD + SHIFT + G and I’ve tried the terminal command. Doesn’t find anything.
Also, the devonthink scripts folder I presume is the one that you access through “Open Scripts Folder” command which has four folders in it (Reminders, Smart Rules, Toolbar, Menu)?
Re /usr/local/bin: you can just create it. As far as I gathered from the rest of this thread, you’ll also need Java to be installed.
The different script folders are described in DT’s documentation.
You can create the /usr/local/bin with the terminal app if you don’t already have one: mkdir /usr/local then mkdir /usr/local/bin
You’ll need to install java from oracle:
https://www.oracle.com/java/technologies/javase/jdk-jre-macos-catalina.html
Install the AppleScript into the devonthink scripts folder. Easiest way is to use the script menu then open scripts folder.
J
mkdir -p /usr/local/bin
, which will fail if permission is insufficient.
Ok, yeah, permission is denied. I’ve tried a sudo command to bypass it and gain permission but nothing is working.
Edit: Apparently from Catalina onwards Apple has implemented a “System Integrity Protection” which, by my understanding, makes all this unnecessarily difficult. Apparently you can disable SIP but I don’t think that’s recommended.
Here’s a few details on the matter:
That’s annoying.
The location of the shell script really doesn’t matter in the end, you’ll just need to change the path of the shell script at the top of the AppleScript file.
Ok, I did that and then it stops because it can’t find the jarfile:
“Error: Unable to access jarfile /usr/local/bin/extractHighlights.jar”
I looked for the line of code to change regarding this one, like with the other path you mentioned, but can’t find it. I tried to place this jarfile in the same folder I moved the other one to, but it didn’t work.
It is in the only line of the shell script:
java -jar /usr/local/bin/extractHighlights.jar com.jrvirtue.pdf.ExtractHighlights "$@"
Ok I think I’m getting confused between shell script and apple script. So it’s not in the apple script but I have to change it in the shell script, but I can’t open the shell script in the script editor and have to open it in terminal? This is unnecessarily complicated for someone who hasn’t ever done this before, especially for something so simple.
You can probably open the script in textedit.
It’s quite possible that complicated tasks require complicated solutions. If the task were “so simple”, it could be solved by simple means.