Copying PDF Text, Citation, and Clickable Link to Clipboard

In conducting research, I regularly read through PDFs of legal cases, copying key passages to a separate document along with citation information for them. I’d like to use backlinks so that the citation in the separate document provides a clickable link that would take me to the page of the PDF on which the quoted material appears. I’ve been trying to use Keyboard Maestro and AppleScript to automate that process, and have it working in Markdown (using a series of custom metadata fields in DEVONthink). However, I’d like to be able to use RTF instead of Markdown, and I’ve struggled to get an RTF-friendly version of the script to generate a clickable version of the link.

To generate a Markdown link, I have a Keyboard Maestro macro set up that, when invoked, copies the selected text to the clipboard and then runs the following script:

tell application id "DNtp"
	repeat with thisRecord in (selection as list)
		if (type of thisRecord) is PDF document then
			set currentPage to (current page of think window 1) -- This is zero-indexed, so page 20 is page=19. If you are going to use the page in text, you need to increment it plus one.
			set recURL to reference URL of thisRecord
			set xref to (recURL & "?page=" & currentPage)
			set customMD to custom meta data of thisRecord
			set mdcasename to (mdcasename of customMD)
			set mdcasecitation to (mdcasecitation of customMD)
			set mdcourt to (mdcourt of customMD)
			set mdfirstpage to (mdfirstpage of customMD)
			set pincite to ((mdfirstpage as number) + (currentPage))
			if mdcourt ≠ "U.S." then
				set mdcourt to mdcourt & " "
			end if
			if mdcourt = "U.S." then
				set mdcourt to ""
			end if
			set mdyear to (mdyear of customMD)
			set the clipboard to ((the clipboard) & "  " & return & "--  " & return & "[" & "_" & mdcasename & "_" & ", " & mdcasecitation & ", " & pincite & " (" & mdcourt & mdyear & ")" & "]" & "(" & xref & ")")
			
		end if
	end repeat
	
	
end tell

That script places on the clipboard a copy of the selected text, followed by the case citation wrapped in brackets, followed by the xref for the reference URL and page in parentheses, and when run through Markdown conversion produces exactly what I’m looking for. But I haven’t been able to generate something equivalent to use in RTF versions. Indeed, even if I just leave the xref URL as text at the end of the clipboard snippet, it doesn’t show up as a clickable link in a DEVONthink RTF file (though strangely it does if I paste the same clipboard into, for example, OmniOutliner).

Is there a way to get the same end result of 1.) the script above + 2.) Markdown conversion, without having to go through Markdown conversion, so that I can just use these in DEVONthink RTF file or paste into another text editor? I’ve been trying to piece something together by reviewing other posts discussing back linking, but my AppleScripting proficiency… leaves something to be desired.

I’ve added a handler to your script that copies both, a markdown and a RTF link.

Depending on the destination document one link is inserted when you paste, in RTF documents you’ll get a RTF link, in markdown a markdown link.

-- Set clipboard to selected text and versions of a link to the current PDF page (RTF and markdown)

-- The handler sets the clipboard to both, a Markdown and a RTF version of the link.
-- Depending on the destination document one link is inserted when you paste, in RTF documents you'll get a RTF link, in markdown a markdown link.
-- It can be used to either copy only links or links plus text.

tell application id "DNtp"
	try
		set theRecords to selected records
		if theRecords = {} or (count theRecords) > 1 then error "Please select a PDF document"
		set thisRecord to item 1 of theRecords
		
		if (type of thisRecord) is PDF document then
			set currentPage to (current page of think window 1) -- This is zero-indexed, so page 20 is page=19. If you are going to use the page in text, you need to increment it plus one.
			set recURL to reference URL of thisRecord
			set xref to (recURL & "?page=" & currentPage)
			set customMD to custom meta data of thisRecord
			set mdcasename to (mdcasename of customMD)
			set mdcasecitation to (mdcasecitation of customMD)
			set mdcourt to (mdcourt of customMD)
			set mdfirstpage to (mdfirstpage of customMD)
			set pincite to ((mdfirstpage as number) + (currentPage))
			if mdcourt ≠ "U.S." then
				set mdcourt to mdcourt & " "
			end if
			if mdcourt = "U.S." then
				set mdcourt to ""
			end if
			set mdyear to (mdyear of customMD)
			
			-- get selected text
			try
				set selectedText to selected text of think window 1 & "" as string
			on error
				error "No text selected"
			end try
			
			-- copy links and selected text  
			set theName to ("_" & mdcasename & "_" & ", " & mdcasecitation & ", " & pincite & " (" & mdcourt & mdyear & ")") as string
			my copyMarkdownRTFLink(theName, xref, selectedText, linefeed & "--  " & linefeed, "<br>--  <br>")
			
			-- copy links without text
			#my copyMarkdownRTFLink(theName, xref, "", "", "")
			
		else
			error "Please select a PDF document"
		end if
		
	on error error_message number error_number
		if the error_number is not -128 then display alert "DEVONthink" message error_message as warning
		return
	end try
end tell

on copyMarkdownRTFLink(theName, theURL, theText, theDelimiter_Markdown, theDelimiter_HTML)
	try
		set theName to my replace_String(theName, "&", "&amp;")
		set theName to my replace_String(theName, "<", "&lt;")
		set theName to my replace_String(theName, ">", "&gt;")
		set theHTMLLink to ("<a href=\"" & theURL & "\">" & theName & "</a>") as string
		do shell script "export LANG=\"en_US.UTF-8\" && echo '<font face=\"helvetica\">' " & quoted form of theText & quoted form of theDelimiter_HTML ¬
			& quoted form of theHTMLLink & "'</font>' | textutil -format html -convert rtf -inputencoding UTF-8 -stdin -stdout | pbcopy -Prefer rtf"
		if theText ≠ "" then set theText to theText & space & space
		set theMarkdownText to (theText & theDelimiter_Markdown & "[" & theName & "](" & theURL & ")") as string
		set theClipboard_record to the clipboard as record -- https://forum.latenightsw.com/t/how-do-i-set-clipboard-pasteboard-to-both-rich-text-rtf-and-plain-text/1189/3
		set theClipboard_RTFdata to «class RTF » of theClipboard_record -- binary RTF data
		set the clipboard to {Unicode text:theMarkdownText, «class RTF »:theClipboard_RTFdata}
	on error error_message number error_number
		activate application id "DNtp"
		display alert "Error: Handler \"copyMarkdownRTFLink\"" message error_message as warning
		error number -128
	end try
end copyMarkdownRTFLink

on replace_String(theText, oldString, newString)
	local ASTID, theText, oldString, newString, lst
	set ASTID to AppleScript's text item delimiters
	try
		considering case
			set AppleScript's text item delimiters to oldString
			set lst to every text item of theText
			set AppleScript's text item delimiters to newString
			set theText to lst as string
		end considering
		set AppleScript's text item delimiters to ASTID
		return theText
	on error eMsg number eNum
		set AppleScript's text item delimiters to ASTID
		error "Can't replaceString: " & eMsg number eNum
	end try
end replace_String

1 Like

Thank you - that sounds perfect. I’m certain that I’m overlooking something that would be obvious to someone who knows what they’re doing, but when I try to compile this, I get a syntax error for the line “set theRecords to selected records” stating “Expected end of line but found plural class name.”

Does that mean I’m compiling it incorrectly, or is there something else I’m overlooking?

Oh. Of course I tested the script and just looked up DEVONthink’s dictionary:

selected record
selected record (noun), pl selected records. A selected record.

No idea why you can’t compile it. Do you use DEVONthink 3.6?

Checked again, the script as posted above does compile in Script Debugger and Script Editor over here.

Well, that would be the problem - I was using a pre-3.6 version. Updating solved the problem, and this works beautifully. Thanks very much!

1 Like

@bws950 I realize it’s been a while since you posted this, but I’ve been looking for a solution like this for legal research and doc review within Devonthink for years (not sure how I never came across this post) and am having trouble implementing it (getting error message “The variable customMD is not defined.”). Do you still run this using Keyboard Maestro or can I do it with an AppleScript? Thanks in advance.

I get a similar error if the custom MD fields used are empty (not sure if it is the same issue as yours). My workaround has been to pre-populate the fields of my case authorities with an underscore on import using a script and editing the script to return “” for that “empty” part of the citation.

I don’t know if that is strictly necessary or if there is a better way to deal with empty custom MD fields. @pete31 would know much more, I expect.

I get the same error if I don’t populate all the identified custom metadata fields, so I try to populate them when I add a new case to Devonthink. I name the PDF in Devonthink using a complete citation (e.g., Smith v. Jones, 555 U.S. 444 (2006)), add it to my “Legal Sources” Group, add a “needsmetadata” tag, and then run the smart rule below on it, which uses a regular expression to (imperfectly) extract pieces of the full case name and citation and add them to the custom metadata fields that are used in the script.

The regex works best with federal appellate cases; it will have more trouble with trial court or state cases, where the case reporters are more varied. And even when the regex works, I always need to add the “Firstpage” metadata field manually, because it’s set up as an integer number (so that the pincite in the citation that the script outputs can be calculated using the PDF page) and I was never able to get the smart rule to fill in that integer number using output from the regex. All of the other metadata fields are Single-line text fields, which the smart rule is generally able to fill – though I also need to enter the “court” field manually if it’s a court that doesn’t appear in the parenthetical before the year (e.g., the Supreme Court).

You could probably substantially simplify the script by removing many of the custom metadata fields in order to just give yourself the file name and a link to the PDF page, but the extra complexity has been worth it to me in order to have an accurate citation that I can easily reference without needing to pull up the PDF itself.

To make it a little easier to copy in case you’re interested, here’s the regular expression:

(?<Party1>[A-Za-z].*)\s+v.\s+(?<Party2>[A-Za-z].*),\s+(?<Volume>\d+)\s+(?<Reporter>[^\s]+)\s+(?<Pincite>\d+)\s+((?<Court>(?<=()(.*?(?=[1-2][0-9][0-9][0-9])))(?<Year>\d{4}))

And to answer your other question: I have a fairly simple Keyboard Maestro macro that highlights the selected text and then runs the AppleScript referenced in the post. I haven’t tried (at least recently), but I expect you could get the same effect (minus the highlighting) by just running the AppleScript from the scripts menu in DT.

Thanks for sharing - nice regex.

I went the other way - I just have the case name (ie parties) in the file name and then use the citation from the custom metadata when I need the citation. I use one KM combination to trigger the case name + citation with a markdown link back to the case for my notes and another KM combination to copy the case name + citation without a link to the clipboard for pasting into court submissions/ advices etc.

JFTR: The RE appeared to be broken, while in fact it was only rendered incorrectly. I added \ characters in front of the <s and *s so that it should be ok now. It would actually be better to post this kind of thing as code, like so:
```
code goes here and is rendered verbatim
```

Great work with the named capturing groups, BTW. One rarely sees that.