I’ll just put this here for now, so it might prevent others from having to write the same code twice. It’s not an easy turn-key solution yet for most, but it might give you some ideas.
This code gets highlights and comments from both a PDF and a Markdown annotations and merges them as one list. The markdown file should look like this (in my case I presume Markdown frontmatter, but you can change the regex in getSections()
in the Markdown annotation helper to anything else (it splits a markdown file in three sections: frontmatter, info (title and excerpt) and other content.
---
date: 2022-05-08 22:43
url: https://annehelen.substack.com/p/youre-still-not-working-from-home
itemurl: x-devonthink-item://5C89CF00-7BD6-4FFC-AF02-3440C9EA7035
annotationurl: x-devonthink-item://A6B3D685-D6F5-4626-9197-1F47EFE6771F
path: you're still not working from home.pdf
tags: [00-review]
---
excerpt:: This is the Sunday edition of Culture Study — the newsletter from Anne Helen Petersen, which you can read about here. If you like it and want more like it in your inbox, consider subscribing. A very weird thing about writing books = you often start thinking about your next one as you’re in the process of promoting the current one. I finished the fact checking and copy editing process on
> Until recently, massive implementation of work from home seemed
Comment A
> This is the dark truth of the WFH Forever revolution. It promises to liberate workers from the chains of the oEce. But in practice, it capitalizes on the total collapse of work-life balance. We know this from experience; a5er more than six months of working from home, you also know this from experience.
Comment B
The annotations should be exactly like they appear when copy pasting from the PDF, so any errors in the text-layer are also represented here (and in this case there are quite a few). On the plus side: this makes it possible to use these text strings in a search
command for x-devonthink-item
URLs, see: Additional resources - Automatically capture and annotate items: DEVONthink helper, Smart rule scripts, JS/Markdown helper - #15 by mdbraber
It can also write annotations back to the PDF (see addAnnotations
in DEVONthink helper). I’ll be happy to add some more context / how to later. I’m using the functions within Markdown Annotation helper scripts that do more things, hence the longer string of functions to process markdown from an annotation file. Also see this for more context: Automatically capture and annotate items (to use with Obsidian)
To use these script you’ll also need RegexAndStuffLib Script Library - AppleScript - Late Night Software Ltd. for the RegexAndStuff library.
use AppleScript version "2.4" -- Yosemite (10.10) or later
use scripting additions
use framework "Foundation"
use framework "Quartz"
use script "RegexAndStuffLib" version "1.0.7"
tell application id "DNtp"
repeat with theRecords in (selection as list)
set itemPath to path of theRecord
set maRecord to annotation of theRecord
set maText to plain text of maRecord
set maAnnotations to ma's getAnnotationsFromText(maText)
set annotationsMerged to DT's getAnnotationsMerged(itemPath, maAnnotations)
set theText to ""
repeat with theAnnotation in annotationsMerged
set theText to theText & "> " & theAnnotation's highlight & "\n\n"
if theAnnotation's comment is not missing value and theAnnotation's comment ≠ "" then set theText to theText & theAnnotation's comment & "\n\n"
end repeat
log theText
end repeat
end tell
DEVONthink helper functions
on _processAnnotations(thePath, theItems, theHighlightColorIndex, boolAdd)
try
set thePDF to current application's PDFDocument's alloc()'s initWithURL:(current application's |NSURL|'s fileURLWithPath:thePath)
set theItems to (current application's NSArray's arrayWithArray:theItems)
set theSelectionsArray to current application's NSMutableArray's new()
set theAnnotationsArray to current application's NSMutableArray's new()
repeat with i from 0 to ((theItems's |count|()) - 1)
set thisItem to (theItems's objectAtIndex:i)
set thisSearchTerm to ((current application's NSArray's arrayWithArray:thisItem)'s objectAtIndex:0)
set thisComment to ((current application's NSArray's arrayWithArray:thisItem)'s objectAtIndex:1)
set theResultSelections to (thePDF's findString:thisSearchTerm withOptions:0)
if theResultSelections's |count|() ≠ 0 then
set theSelectionsArray to my _addSelectionToSelectionsArray(thisSearchTerm, theResultSelections, thisComment, theSelectionsArray)
else
set thisSearchTerm_Components to (thisSearchTerm's componentsSeparatedByString:space)
repeat with j from 0 to ((thisSearchTerm_Components's |count|()) - 1)
set theseSearchTerm_Components_1 to (thisSearchTerm_Components's subarrayWithRange:{0, (thisSearchTerm_Components's |count|()) - j})
set thisSearchTerm_Part_1 to (theseSearchTerm_Components_1's componentsJoinedByString:space)
set theResultSelections to (thePDF's findString:thisSearchTerm_Part_1 withOptions:0)
if theResultSelections's |count|() ≠ 0 then
set theSelectionsArray to my _addSelectionToSelectionsArray(thisSearchTerm_Part_1, theResultSelections, thisComment, theSelectionsArray)
set thisLocation to (theseSearchTerm_Components_1's |count|())
set thisLength to (thisSearchTerm_Components's |count|()) - thisLocation
set theseSearchTerm_Components_2 to (thisSearchTerm_Components's subarrayWithRange:{thisLocation, thisLength})
set thisSearchTerm_Part_2 to (theseSearchTerm_Components_2's componentsJoinedByString:space)
set theResultSelections to (thePDF's findString:thisSearchTerm_Part_2 withOptions:0)
if theResultSelections's |count|() ≠ 0 then
set theSelectionsArray to my _addSelectionToSelectionsArray(thisSearchTerm_Part_2, theResultSelections, thisComment, theSelectionsArray)
end if
exit repeat
end if
end repeat
end if
end repeat
set theDeeperLookArray to current application's NSMutableArray's new()
repeat with i from 0 to ((theItems's |count|()) - 1)
set thisSearchTerm to ((current application's NSArray's arrayWithArray:(theItems's objectAtIndex:i))'s objectAtIndex:0)
repeat with j from 0 to ((theItems's |count|()) - 1)
set thatSearchTerm to ((current application's NSArray's arrayWithArray:(theItems's objectAtIndex:j))'s objectAtIndex:0)
if (thisSearchTerm's containsString:thatSearchTerm) and not (thisSearchTerm's isEqualTo:thatSearchTerm) then
(theDeeperLookArray's addObject:thatSearchTerm)
end if
end repeat
end repeat
if theHighlightColorIndex < 1 or theHighlightColorIndex > 7 then error "No valid HighlightColor index. Valid: 1-7"
if current application's id = "com.devon-technologies.think3" then
set theDefaults to current application's NSUserDefaults's standardUserDefaults()
else
set theDefaults to current application's NSUserDefaults's alloc()'s initWithSuiteName:"com.devon-technologies.think3"
end if
set theDictionary to (theDefaults's dictionaryRepresentation())'s dictionaryWithValuesForKeys:{"HighlightColor-0", "HighlightColor-1", "HighlightColor-2", "HighlightColor-3", "HighlightColor-4", "HighlightColor-5", "HighlightColor-6"}
set theColorDictionary to theDictionary's objectForKey:("HighlightColor-" & ((theHighlightColorIndex - 1) as string))
set theRed to current application's NSNumber's numberWithDouble:((theColorDictionary's valueForKey:"red"))
set theGreen to current application's NSNumber's numberWithDouble:((theColorDictionary's valueForKey:"green"))
set theBlue to current application's NSNumber's numberWithDouble:((theColorDictionary's valueForKey:"blue"))
set theAlpha to current application's NSNumber's numberWithDouble:((theColorDictionary's valueForKey:"alpha"))
set theHighlightColor to current application's NSColor's colorWithCalibratedRed:theRed green:theGreen blue:theBlue alpha:theAlpha
set theDocumentAuthor to (theDefaults's dictionaryRepresentation())'s stringForKey:"DocumentAuthor"
set theAnnotationSubtype to current application's PDFAnnotationSubtypeHighlight
set theAnnotationProperties to current application's NSMutableDictionary's new()
(theAnnotationProperties's setObject:theHighlightColor forKey:(current application's PDFAnnotationKeyColor))
repeat with i from 0 to ((theSelectionsArray's |count|()) - 1)
set thisItem to (theSelectionsArray's objectAtIndex:i)
set thisSearchTerm to (thisItem's valueForKey:"SearchTerm")
set thisComment to (thisItem's valueForKey:"Comment")
set thisResultSelection_BoundsForPage to (thisItem's valueForKey:"Selection_BoundsForPage")
set thisResultSelection_Lines_QuadPointsArray to (thisItem's valueForKey:"Selection_Lines_QuadPoints")
set createAnnotation to true
if (theDeeperLookArray's containsObject:thisSearchTerm) then
set thisResultSelection_Page_Label to (thisItem's valueForKey:"Selection_Page_Label")
set thisSelectionsArray_filtered to (theSelectionsArray's filteredArrayUsingPredicate:(current application's NSPredicate's predicateWithFormat:("self.Selection_Page_Label = " & quoted form of (thisResultSelection_Page_Label as string) & " AND " & "self.SearchTerm CONTAINS " & quoted form of (thisSearchTerm as string) & " AND " & "!self.SearchTerm = " & quoted form of (thisSearchTerm as string))))
repeat with j from 0 to ((thisSelectionsArray_filtered's |count|()) - 1)
set thatItem to (thisSelectionsArray_filtered's objectAtIndex:j)
set thatResultSelection_BoundsForPage to (thatItem's valueForKey:"Selection_BoundsForPage")
set thatResultSelection_Lines_QuadPointsArray to (thatItem's valueForKey:"Selection_Lines_QuadPoints")
set intersectsBoundsForPage to current application's NSIntersectsRect(thisResultSelection_BoundsForPage, thatResultSelection_BoundsForPage)
if intersectsBoundsForPage then
repeat with k from 0 to ((thisResultSelection_Lines_QuadPointsArray's |count|()) - 1) by 4
set thisResultSelection_Line_QuadPoints to (thisResultSelection_Lines_QuadPointsArray's subarrayWithRange:{k, 4})
set thisResultSelection_Line_Bounds to my makeRect(thisResultSelection_Line_QuadPoints)
repeat with l from 0 to ((thatResultSelection_Lines_QuadPointsArray's |count|()) - 1) by 4
set thatResultSelection_Line_QuadPoints to (thatResultSelection_Lines_QuadPointsArray's subarrayWithRange:{l, 4})
set thatResultSelection_Line_Bounds to my makeRect(thatResultSelection_Line_QuadPoints)
set intersectsBoundsForLine to current application's NSIntersectsRect(thisResultSelection_Line_Bounds, thatResultSelection_Line_Bounds)
if intersectsBoundsForLine then
set createAnnotation to false
exit repeat
end if
end repeat
if intersectsBoundsForLine then exit repeat
end repeat
if intersectsBoundsForLine then exit repeat
end if
end repeat
end if
if createAnnotation then
set thisDate to (current application's NSDate's |date|())
(theAnnotationProperties's setObject:thisDate forKey:(current application's PDFAnnotationKeyDate))
set thisAnnotation to (current application's PDFAnnotation's alloc()'s initWithBounds:(thisResultSelection_BoundsForPage) forType:theAnnotationSubtype withProperties:theAnnotationProperties)
set thisAnnotation_QuadPoints to current application's NSMutableArray's new()
repeat with i from 0 to ((thisResultSelection_Lines_QuadPointsArray's |count|()) - 1)
(thisAnnotation_QuadPoints's addObject:(current application's NSValue's valueWithPoint:(thisResultSelection_Lines_QuadPointsArray's objectAtIndex:i)))
end repeat
(thisAnnotation's setValue:(thisAnnotation_QuadPoints) forAnnotationKey:(current application's PDFAnnotationKeyQuadPoints))
if not (theDocumentAuthor's isEqualTo:"") then (thisAnnotation's setUserName:theDocumentAuthor)
set thisResultSelection_Page to (thisItem's valueForKey:"Selection_Page")
if boolAdd then
if thisComment ≠ "" then
-- add popup
set popupAnnotationSubtype to current application's PDFAnnotationSubtypePopup
set popupAnnotation to (current application's PDFAnnotation's alloc()'s initWithBounds:(thisResultSelection_BoundsForPage) forType:popupAnnotationSubtype withProperties:theAnnotationProperties)
(thisAnnotation's setValue:thisComment forAnnotationKey:(current application's PDFAnnotationKeyContents))
(thisAnnotation's setValue:popupAnnotation forAnnotationKey:(current application's PDFAnnotationKeyPopup))
end if
(thisResultSelection_Page's addAnnotation:thisAnnotation)
else
set thisPosX to first item of (first item of thisAnnotation's |bounds|())
set thisPosY to second item of (first item of thisAnnotation's |bounds|())
(theAnnotationsArray's addObject:{page:(thisResultSelection_Page's label as integer) - 1, posY:thisPosY, posX:thisPosX, highlight:thisSearchTerm, comment:thisComment, isDuplicate:false})
end if
end if
end repeat
if boolAdd then
thePDF's writeToFile:thePath
else
return my sortAnnotationsArray(theAnnotationsArray)
-- return theAnnotationsArray
end if
on error error_message number error_number
activate
if the error_number is not -128 then display alert "Error: Handler \"addHighlightAnnotations\"" message error_message as warning
error number -128
end try
end _processAnnotations
on _addSelectionToSelectionsArray(thisSearchTerm, theResultSelections, theComment, theSelectionsArray)
try
repeat with i from 0 to ((theResultSelections's |count|()) - 1)
set thisResultSelection to (theResultSelections's objectAtIndex:i)
set thisResultSelection_Pages to thisResultSelection's pages()
repeat with j from 0 to ((thisResultSelection_Pages's |count|()) - 1)
set thisResultSelection_Page to (thisResultSelection_Pages's objectAtIndex:j)
set thisResultSelection_BoundsForPage to (thisResultSelection's boundsForPage:thisResultSelection_Page)
set thisResultSelection_Lines to thisResultSelection's selectionsByLine
set thisResultSelection_Lines_QuadPointsArray to current application's NSMutableArray's new()
repeat with k from 0 to ((thisResultSelection_Lines's |count|()) - 1)
set thisResultSelection_Line to (thisResultSelection_Lines's objectAtIndex:k)
set thisResultSelection_Line_Page to (thisResultSelection_Line's pages())'s firstObject()
if (thisResultSelection_Line_Page's isEqualTo:thisResultSelection_Page) then
set thisResultSelection_Line_BoundsForPage to (thisResultSelection_Line's boundsForPage:thisResultSelection_Line_Page)
set thisResultSelection_Line_BoundsForPage to current application's NSRect's NSInsetRect(thisResultSelection_Line_BoundsForPage, -1, -1) -- DEVONthink recognizes text more reliably
set MinX to current application's NSRect's NSMinX(thisResultSelection_Line_BoundsForPage)
set MinY to current application's NSRect's NSMinY(thisResultSelection_Line_BoundsForPage)
set MaxX to current application's NSRect's NSMaxX(thisResultSelection_Line_BoundsForPage)
set MaxY to current application's NSRect's NSMaxY(thisResultSelection_Line_BoundsForPage)
(thisResultSelection_Lines_QuadPointsArray's addObject:{MinX, MaxY})
(thisResultSelection_Lines_QuadPointsArray's addObject:{MaxX, MaxY})
(thisResultSelection_Lines_QuadPointsArray's addObject:{MinX, MinY})
(thisResultSelection_Lines_QuadPointsArray's addObject:{MaxX, MinY})
end if
end repeat
set thisResultSelection_Page_Label to thisResultSelection_Page's label()
(theSelectionsArray's addObject:{Selection_Page:thisResultSelection_Page, Selection_Page_Label:thisResultSelection_Page_Label, Selection_BoundsForPage:thisResultSelection_BoundsForPage, Selection_Lines_QuadPoints:thisResultSelection_Lines_QuadPointsArray, SearchTerm:thisSearchTerm, |Comment|:theComment})
end repeat
end repeat
return theSelectionsArray
on error error_message number error_number
activate
if the error_number is not -128 then display alert "Error: Handler \"_addSelectionToSelectionsArray\"" message error_message as warning
error number -128
end try
end _addSelectionToSelectionsArray
on addAnnotations(thePath, theItems, theHighlightColorIndex)
return my _processAnnotations(thePath, theItems, theHighlightColorIdex, true)
end addAnnotations
on getAnnotationsFromPDFMatch(thePath, theItems)
return my _processAnnotations(thePath, theItems, 1, false)
end getAnnotationsFromPDFMatch
on getAnnotationsFromPDF(thePath)
set thePDF to current application's PDFDocument's alloc()'s initWithURL:(current application's |NSURL|'s fileURLWithPath:thePath)
set theAnnotationsArray to current application's NSMutableArray's new()
repeat with i from 0 to ((thePDF's |pageCount|()) - 1)
set thePage to (thePDF's pageAtIndex:i)
set theAnnotations to thePage's annotations()
repeat with theAnnotation in theAnnotations
if theAnnotation's |type|() as string is "Highlight" then
set thePosY to second item of (first item of theAnnotation's |bounds|())
set thePosX to first item of (first item of theAnnotation's |bounds|())
set theHighlight to theAnnotation's |textMarkupString|()
set theComment to theAnnotation's |contents|()
(theAnnotationsArray's addObject:{page:i, posY:thePosY, posX:thePosX, highlight:theHighlight, comment:theComment, isDuplicate:false})
end if
end repeat
end repeat
return my sortAnnotationsArray(theAnnotationsArray)
-- return theAnnotationsArray
end getAnnotationsFromPDF
on getAnnotationsMerged(thePath, theItems)
set annotationsFromMarkdown to my getAnnotationsFromPDFMatch(thePath, theItems)
set annotationsFromPDF to my getAnnotationsFromPDF(thePath)
return my mergeAnnotationsArray(annotationsFromMarkdown, annotationsFromPDF) as list
end getAnnotationsMerged
on mergeAnnotationsArray(theAnnotationsArrayA, theAnnotationsArrayB)
set theMergedArray to current application's NSMutableArray's new()
set uniqueHighlightsFromB to (theAnnotationsArrayB's filteredArrayUsingPredicate:(current application's NSPredicate's predicateWithFormat:("NOT self.highlight IN %@") argumentArray:{(theAnnotationsArrayA's valueForKey:"highlight")}))
(theMergedArray's addObjectsFromArray:(theAnnotationsArrayA))
(theMergedArray's addObjectsFromArray:(uniqueHighlightsFromB))
return my sortAnnotationsArray(theMergedArray)
end mergeAnnotationsArray
on sortAnnotationsArray(theAnnotationsArray)
set theDescriptorPage to current application's NSSortDescriptor's sortDescriptorWithKey:"page" ascending:true
set theDescriptorPosY to current application's NSSortDescriptor's sortDescriptorWithKey:"posY" ascending:false
set theDescriptorPosX to current application's NSSortDescriptor's sortDescriptorWithKey:"posX" ascending:true
return theAnnotationsArray's sortedArrayUsingDescriptors:{theDescriptorPage, theDescriptorPosY, theDescriptorPosX}
end sortAnnotationsArray
on makeRect(theSelection_QuadPoints)
try
set theSelection_QuadPoints to (theSelection_QuadPoints as list)
set MinX to theSelection_QuadPoints's item 1's item 1
set MinY to theSelection_QuadPoints's item 3's item 2
set theWidth to (theSelection_QuadPoints's item 2's item 1) - MinX
set theHeight to (theSelection_QuadPoints's item 1's item 2) - MinY
set theRect to current application's NSRect's NSMakeRect(MinX, MinY, theWidth, theHeight)
on error error_message number error_number
activate
if the error_number is not -128 then display alert "Error: Handler \"makeRect\"" message error_message as warning
error number -128
end try
end makeRect
on tid(theInput, theDelimiter)
set d to AppleScript's text item delimiters
set AppleScript's text item delimiters to theDelimiter
if class of theInput = text then
set theOutput to text items of theInput
else if class of theInput = list then
set theOutput to theInput as text
end if
set AppleScript's text item delimiters to d
return theOutput
end tid
Markdown Annotation helper functions
use AppleScript version "2.4" -- Yosemite (10.10) or later
use script "RegexAndStuffLib" version "1.0.7"
use DT : script "DEVONthink helper"
use scripting additions
on getSections(maText)
-- item 1: frontmatter
-- item 2: info (title and excerpt)
-- item 3: content
-- set maSections to regex split maText search pattern "^---$"
set maSections to regex search maText search pattern "^---\n(.*?)\n---\n*((?:.*?# .*?(?:\n|$))?\n*(?:[Ee]xcerpt::.*?(?:\n|$))?)\n*(.*)" capture groups {1, 2, 3} with dot matches all
if maSections is equal to {} or ((count of maSections) > 0 and ((count of (item 1 of maSections)) < 2)) then error "Markdown file not properly formatted (no frontmatter section?)"
return item 1 of maSections
end getSections
on getContentFromList(maSections)
return item 3 of maSections
end getContentFromList
on getContentFromText(maText)
set maSections to getSections(maText)
return my getContentFromList(maSections)
end getContentFromText
on getContent(maText)
return my getContentFromText(maText)
end getContent
on getAnnotationsFromText(maText)
set theContent to my getContent(maText)
-- Fix for 3.8.3 putting preceding comments with blockquote
set theResults to regex search theContent search pattern "(?<=>)\\s*(.*?)\n*(?=(?:>|$))" capture groups 1 with dot matches all without anchors match lines
set theAnnotations to {}
repeat with theResult in theResults
set end of theAnnotations to first item of (regex search theResult search pattern "(.*)\n*(.*)" capture groups {1, 2})
end repeat
return theAnnotations
end getAnnotationsFromText