The content of this workflow / scripts is too large to fit in one post - see these posts for the other parts/scripts:
- Additional resources - Automatically capture and annotate items: Markdown Annotation Helper
- Additional resources - Automatically capture and annotate items: DEVONthink helper, Smart rule scripts, JS/Markdown helper
I’ve read several threads on this forum about creating/extracting highlights, backlinking, working together with Obsidian etc. I’ve been developing a system to clip content, automatically capture PDFs (for long-term reference and search), automatically create/update annotations for captured resources, be able to link / use annotations in sync with Obsidian.
What these scripts do
- Automatically capture clipped bookmarks / URLs to PDF
- Automatically create an annotation for all captured content (bookmarks and PDF)
- Automatically link annotation files to captured content
- Update information in DT when update the annotation (.md) file (e.g. changing tags)
- Update information in .md file when updating MD (e.g. changing tags)
- Show links to original files and DT items based on Markdown metadata using JS (see last script)
- Use Keyboard Maestro to open files in Finder (circumventing Obsidian
file://
restrictions)
Caveat emptor
This is a highly personal setup. I’m providing these scripts and workflows because it might help others (and I’ve been able to build this based on the very helpful posts and comments on this forum myself!). There might be some generic pieces which could be interesting e.g. on processing Markdown files in the Helper scripts below.
I’d be surprised if anyone gets this set up (or even wants to) in the same way I have (as my requirements are probably highly peculiar anyways) I’m probably not able to offer much support so this is mostly if you’re quite familiar with scripting in DEVONthink. It’s all AppleScript so I’m just waiting for @chrillek to write a JXA version of all of this
Why such an elaborate workflow?
My reasoning for this is that the annotation file can hold all “outgoing” information (the original URL, title, capture date, DT links, tags), but at the same time I’m staying ‘independent’ of DT when it might go away, not be available. It also prevents having to save all my captured (PDF) content directly to my Obsidian vault, while still getting all the context (URLs, highlights, notes etc.) - this makes it all more lightweight for daily use. Linking to a ‘resource’ in Obsidian, means linking to the .md file which hold all the relevant context to proceed from or add information to.
Workflow
When you’ve installed everything in this post (a lot!) you’re able to clip something as a bookmark or a PDF and automatically create / update annotations or items. As an added bonus you can also capture content by clipping it via an imported markdown file (e.g. via MarkDownload. Actually: using MarkDownload to clip content was how I originally started - currently I’m mostly clipping bookmarks or PDFs directly.
Annotation files
An annotation file looks like this (below) and is ‘linked’ via set annotation
. I’m not using the standard Annotation group or naming which DT uses, but I’m putting all annotations in a single annotationsGroup e.g. /Notes/Content, which is a folder in my Obsidian vault (this vault is also indexed in my DT)
---
date: 2022-03-13 22:50
url: https://gist.github.com/itst/780dee5c510db6d1327c34c39166eb0f
itemurl: x-devonthink-item://D6C8E1D6-B386-44BC-98DB-6FA7E08F9BDF
annotationurl: x-devonthink-item://95CCC418-3AED-4016-A02E-C4FCC7A67B9B
path: Resources/fiddle/pkm/read-later/Import and regularly replicate your Pinboard bookmarks in DEVONthink.pdf
tags: [fiddle,pkm,read-later,devonthink]
---
Excerpt:: Import and regularly replicate your Pinboard bookmarks in DEVONthink. - Pinboard.scpt
To install these scripts:
- Check the other posts and scripts: Additional resources - Automatically capture and annotate items: Markdown Annotation Helper and Additional resources - Automatically capture and annotate items: DEVONthink helper, Smart rule scripts, JS/Markdown helper
- Create a “/Content” group in a database (e.g. Resources) to put all your content in. I’m using Group Tags instead of folders
- All captured content is put in “/Content/00-captured”
- Add a custom metadata item ‘originaltags’ (Single line text) . This is needed to be able keep original tags (added with clipping content) when using Classify
- Download and install [RegexAndStuffLib v1.0.7] to (https://s3.amazonaws.com/latenightsw.com/ShaneLibs/RegexAndStuffLib_stuff.zip) into ~/Library/Script Libraries/ - see RegexAndStuffLib Script Library - AppleScript - Late Night Software Ltd. for more info
- Install Readability.js in /Users/mdbraber/Library/Application Scripts/com.devon-technologies.think3/Smart Rules
- Install the scripts below in the Smart Rules directory (/Users/mdbraber/Library/Application Scripts/com.devon-technologies.think3/Smart Rules)
- Set up the Smart Rules and inline Applescripts - I’m using a Smart Rule on the Inbox and put all captured content in “/Content/00-captured” for further processing (mostly tagging)
Bugs / TODO
- Probably this whole thing is hard to figure out anyway, so I’d be surprised if anyway gets this setup, but maybe there are bits and pieces which are useful for someone
- Comments from clipped content (e.g. a bookmark) are considered an Excerpt in the .md file. I still need to add some regex to be able to also add comments and an excerpt inside the Comments field
- I’ve got some code to automatically extract highlights from PDFs and do the reverse: use text from an annotation file as highlight in PDFs. It’s mostly barebones for now, I might share this at a later stage.
Applescript: Process incoming annotation
use DT : script "DEVONthink helper"
use ma : script "Markdown Annotation helper"
use script "RegexAndStuffLib" version "1.0.7"
use scripting additions
on run
tell application id "DNtp" to my performSmartRule(selection as list)
end run
-- Run as smart rule
on performSmartRule(theRecords)
tell application id "DNtp"
repeat with theRecord in theRecords
repeat 1 times -- fake loop to create a simulated continue
-- initialize variables
set captureRecord to missing value
set maRecord to missing value
set pdfRecord to missing value
set theRecordType to (type of theRecord as string)
set maText to ""
set theDatabase to database of theRecord
-- check if group for processed pdf exists
set processedGroup to get record at "/Content/00-captured" in theDatabase
if processedGroup is missing value then
error "No processed group \"/Content/00-captured\" found in current database - create the group first"
end if
-- check if group for annotations exists
set theAnnotationsGroup to "/Notes/Content"
set annotationsGroup to get record at (theAnnotationsGroup) in theDatabase
if annotationsGroup is missing value then
error "No annotations group (" & theAnnotationsGroup & ") found in current database - create the group first"
end if
if theRecordType is in {"markdown", "«constant ****mkdn»"} then
-- process markdown record
set maRecord to theRecord
set maText to plain text of maRecord
set maTitle to name without extension of maRecord
set maTitle to DT's sanitize(maTitle)
set maURL to ma's getURL(maText)
set maDate to ma's getDate(maText)
set maTags to ma's getTags(maText)
set maExcerpt to ma's getExcerpt(maText)
-- Fix the URL of ma file which has base64 content because of MarkDownload
set URL of maRecord to missing value
if maURL is not equal to "" then
-- Create a temporary record to capture
set captureRecord to create record with {URL:maURL, type:bookmark} in current group
else
log message "No URL found - skipping"
exit repeat
end if
else if theRecordType is in {"bookmark", "«constant ****DTnx»"} then
-- process bookmark record
set maTitle to name without extension of theRecord
set maTitle to DT's sanitize(maTitle)
set maURL to URL of theRecord
set maCreationDate to creation date of theRecord
set maDate to DT's formatDate(maCreationDate) as string
set maTags to {}
if comment of theRecord is not equal to "" then
set maExcerpt to comment of theRecord
else
set maExcerpt to ""
end if
-- Set the bookmark as the record to capture (will be deleted after capture)
set captureRecord to theRecord
else if theRecordType is in {"pdf", "PDF document", "«constant ****pdf »"} then
-- Clean up Item title (we can't be sure DT already sanitized the filename,
-- e.g. from old imports before sanitizing filenames was added)
set maTitle to name without extension of theRecord
set maTitle to DT's sanitize(maTitle)
-- Title of theRecord is always leading, so will overwrite whatever is in the maFile
--set name of theRecord to maTitle & ".pdf"
-- If we don't include ".pdf" it goes wrong when the title ends with another valid extensions
set name of theRecord to maTitle & ".pdf"
set maCreationDate to creation date of theRecord
set maDate to DT's formatDate(maCreationDate) as string
if (exists annotation of theRecord) then
set currentAnnotationType to type of (annotation of theRecord) as string
if currentAnnotationType is in {"markdown", "«constant ****mkdn»"} then
set maRecord to annotation of theRecord
set maText to plain text of maRecord
-- Get URL from theRecord or otherwise from annotation
if URL of theRecord is not "" then
set maURL to URL of theRecord
else
set maURL to ma's getURL(maText)
end if
-- Get tags from annotation
set maTags to ma's getTags(maText)
set maExcerpt to ma's getExcerpt(maText)
if maExcerpt is equal to missing value and comment of theRecord is not equal to "" then
set maExcerpt to comment of theRecord
end if
else
error "Annotation of selected Item is not of type markdown - cancelling"
end if
else
set maURL to URL of theRecord
set maTags to {}
if comment of theRecord is not equal to "" then
set maExcerpt to comment of theRecord
else
set maExcerpt to ""
end if
end if
set pdfRecord to theRecord
else
error "Cannot process this type of record"
end if
-- capture pdf if necessary
if captureRecord is not missing value and maURL is not "" then
set captureWindow to open window for record captureRecord with force
delay 2
set bounds of captureWindow to {0, 0, 900, 900}
-- If it's already a Item, don't need to do more.
if (maURL ends with ".pdf") is not true then
-- Wait until it's finished loading.
repeat while loading of captureWindow
delay 0.5
end repeat
-- Some pages load content dynamically, with elements not
-- displayed until they come into view. This is a hopeless
-- situation in general but the following heuristic improves
-- outcomes for some cases. We scroll the window by quarters
-- to try to trigger loading of more page elements.
repeat with n from 1 to 4
set scroll to "window.scrollTo(0," & n & "*document.body.scrollHeight/4)"
do JavaScript scroll in current tab of captureWindow
delay 0.75
end repeat
-- Return to the top. Do it twice because sometimes on some
-- pages (notably Twitter), the first attempt gets stuck in
-- some random location. (Ugh, what a hack this is.)
do JavaScript "window.scrollTo(0,0)" in current tab of captureWindow
delay 0.5
do JavaScript "window.scrollTo(0,0)" in current tab of captureWindow
delay 0.25
end if
-- Get the content of this current viewer window, in Item form.
set contentAsPDF to get PDF of captureWindow
-- Create the new record in the the Item group
set pdfRecord to create record with {name:maTitle, URL:maURL, type:PDF document} in processedGroup
set data of pdfRecord to contentAsPDF
-- Match dates of pdfRecord to theRecord
set recordCreationDate to creation date of theRecord
set recordModificationDate to modification date of theRecord
set creation date of pdfRecord to recordCreationDate
set modification date of pdfRecord to recordModificationDate
-- tell application "Finder" to set theCurrentDirectory to container of (path to me) as alias
-- FIXME
set theCurrentDirectory to "Macintosh HD:Users:mdbraber:Library:Application Scripts:com.devon-technologies.think3:Smart Rules:"
set readabilityScriptFile to ((theCurrentDirectory & "Readability.js") as text) as alias
set readabilityScript to read readabilityScriptFile
-- Get an excerpt of the page or use the comment of the current record
if (exists comment of captureRecord) is not true then
do JavaScript readabilityScript in captureWindow
set theExcerpt to do JavaScript "var article = new Readability(document).parse(); article.excerpt;" in captureWindow
else
set theExcerpt to comment of captureRecord
end if
close captureWindow
end if
set theLocation to location of theRecord
if theLocation does not start with "/Content" then
move record pdfRecord to processedGroup
end if
-- Update comments
if maExcerpt is not equal to "" then set comment of pdfRecord to maExcerpt
-- Update annotation
if maRecord is missing value then set maRecord to create record with {name:maTitle, type:markdown} in annotationsGroup
set maItemURL to (reference URL of pdfRecord as string)
set maAnnotationURL to (reference URL of maRecord as string)
set maTags to DT's uniqueList((tags of pdfRecord) & maTags)
set maPath to path of pdfRecord
set maText to ma's updateText("", maDate, maTitle, maExcerpt, maURL, maItemURL, maAnnotationURL, maPath, maTags, true)
set plain text of maRecord to maText
set creation date of maRecord to (creation date of pdfRecord)
--set modification date of maRecord to (modification date of pdfRecord)
move record maRecord to annotationsGroup
-- Update pdfRecord annotation and tags
set annotation of pdfRecord to maRecord
set the tags of pdfRecord to maTags
set originalTags to join strings maTags using delimiter ","
add custom meta data maTags for "originaltags" to theRecord
try
delete record captureRecord
end try
end repeat
end repeat
end tell
end performSmartRule