Extracting text with specific linked Bates citation

Hey DEVONthink Community,

I’ve developed an AppleScript to tackle a common headache for anyone working with Bates numbered documents. This script makes it easy to append properly formatted Bates number citations to copied text, linked to the specific text on the page.

How It Works

  1. Copy with Source Link: It begins by triggering DEVONthink’s “Copy with Source Link” action, ensuring the copied text retains a direct link back to the source document.
  2. Extract and Format Bates Number: The script scrapes the document title to get the Bates number of the documents’ first page* and uses it to calculate the correct Bates number for the active page.
  3. It then formats this number with the same text prefix as the document name, and adds a Markdown-style link to the text-specific link that Copy with Source Link generated.
  • This script assumes documents are name something like this: NOAA0004338.pdf, where page 1 of that document is NOAA0004338. It should work equally for 0004338.pdf, 4338.pdf, 0004338, or 4338, but would need changes for documents that don’t contain the first page’s Bates number in the title.

Example:

Suppose I select this text from NOAA0005224 (which happens to be the fourth page of the document NOAA0005220.pdf):

Many theoretical studies have indicated that ongoing gene flow between hatchery and wild fish may ultimately compromise the fitness of the natural population.

Running this script with that text selected results in the clipboard containing this: Many theoretical studies have indicated that ongoing gene flow between hatchery and wild fish may ultimately compromise the fitness of the natural population. [NOAA0005524](x-devonthink-item://78A4B143-7D2F-4BBB-BE1B-701413F56731?page=4&start=3279&length=167&search=Many%20theoretical%20studies%20have%20indicated%20that%20ongoing%20gene%20flow%20between%20hatchery%20and%20wild%20fish%20may%20ultimately%20compromise%20the%20fitness%20of%20the%20natural%20population.)

This saves me a ton of time when I’m reviewing and extracting excerpts from administrative records.

Next Steps

I’m going to continue making it more flexible to handle documents with different naming conventions, and would love to see any improvements other folks might offer or suggest.

Stay efficient,
Sangye

tell application "System Events"
	tell application process "DEVONthink 3"
		-- Ensure the application is in front to interact with its menus
		set frontmost to true
		-- Navigate the menu structure to trigger the desired item
		click menu item "Copy with Source Link" of menu "Edit" of menu bar 1
	end tell
end tell

tell application id "DNtp"
	try
		set theClipboard to the clipboard -- Get current clipboard content
		
		-- Check if the clipboard content is a valid DEVONthink link
		if theClipboard does not contain "x-devonthink-item://" then error "Clipboard content is not valid."
		
		-- Extract the quoted text and its DEVONthink source link
		set AppleScript's text item delimiters to {"SOURCE: "}
		set clipboardItems to text items of theClipboard
		set quotedText to item 1 of clipboardItems -- The actual text copied
		set sourceLink to item 2 of clipboardItems -- DEVONthink link for the source
		
		set quotedText to my trimText(quotedText) -- Trim leading/trailing whitespace from the text
		
		-- Extract UUID and query parameters from the DEVONthink link
		set AppleScript's text item delimiters to {"x-devonthink-item://", "?", "&"}
		set linkParts to text items of sourceLink
		set docUUID to item 2 of linkParts -- Document UUID
		set queryParameters to items 3 through end of linkParts -- Additional parameters in the link
		
		-- Retrieve the document title using its UUID
		set theSelection to get record with uuid docUUID
		set theTitle to the name of theSelection -- The title, expected to contain a Bates number
		
		-- Separate the numeric part and prefix from the document's title
		set numericPart to my extractNumericPart(theTitle)
		set prefix to my extractPrefix(theTitle) -- Prefix before the numeric part (e.g., "NOAA")
		set totalLength to length of theTitle
		set prefixLength to length of prefix
		set desiredNumericLength to totalLength - prefixLength -- Desired length for the numeric part of Bates number
		
		-- Determine the page number from query parameters
		set pageParam to first item of queryParameters
		set AppleScript's text item delimiters to {"="}
		set pageNumber to second item of (text items of pageParam) as number
		
		-- Calculate the Bates number considering the page offset
		set batesNumber to (numericPart as number) + pageNumber
		set batesFormatted to my formatBatesNumber(batesNumber, prefix, desiredNumericLength) -- Format with leading zeros
		
		-- Construct the markdown link with the formatted Bates number
		set AppleScript's text item delimiters to {"?"}
		set newSourceLink to "x-devonthink-item://" & docUUID & "?" & my joinQueryParameters(queryParameters)
		set markdownLink to "[" & batesFormatted & "](" & newSourceLink & ")"
		
		-- Compile the final clipboard content and update the clipboard
		set finalClipboardContent to quotedText & " " & markdownLink & "."
		set the clipboard to finalClipboardContent
		
	on error errMsg
		display dialog "Error: " & errMsg -- Show any errors that occur
	end try
end tell

-- Formatting the Bates number with leading zeros and adding prefix
on formatBatesNumber(batesNumber, prefix, desiredNumericLength)
	set batesStr to batesNumber as text
	repeat until length of batesStr is equal to desiredNumericLength
		set batesStr to "0" & batesStr -- Pad with zeros to match the desired length
	end repeat
	return prefix & batesStr -- Return formatted Bates number with prefix
end formatBatesNumber

-- Extract numeric part from the document title
on extractNumericPart(title)
	set numericString to ""
	repeat with aChar in characters of title
		if aChar is in "0123456789" then set numericString to numericString & aChar
	end repeat
	return numericString -- Preserve leading zeros by returning as string
end extractNumericPart

-- Extract prefix (non-numeric part) from the document title
on extractPrefix(title)
	set prefix to ""
	repeat with aChar in characters of title
		if aChar is in "0123456789" then exit repeat
		set prefix to prefix & aChar
	end repeat
	return prefix
end extractPrefix

on trimText(inputText)
	-- Trimming leading whitespace and newlines
	repeat while inputText begins with " " or inputText begins with "
" or inputText begins with "
" or inputText begins with "	"
		set inputText to text 2 thru -1 of inputText
	end repeat
	
	-- Trimming trailing whitespace and newlines
	repeat while inputText ends with " " or inputText ends with "
" or inputText ends with "
" or inputText ends with "	"
		set inputText to text 1 thru -2 of inputText
	end repeat
	
	return inputText
end trimText

-- Combine query parameters back into a single string for the link
on joinQueryParameters(parameters)
	set AppleScript's text item delimiters to "&"
	return parameters as string
end joinQueryParameters

Thanks for sharing this script! This should be doable without user interface scripting and parsing actually. E.g. this snippet gets the basically necessary information:

tell application id "DNtp"
	set theRecord to content record of think window 1
	set theUUID to uuid of theRecord
	set theTitle to name of theRecord
	set quotedText to (selected text of think window 1) as string
	set sourceLink to reference URL of think window 1
end tell

edit – it’s very late and I’m dumb. You meant only to replace the Copy with Source Link interface menu call. That’s a helpful addition – thank you :slight_smile:

…and therefore also most stuff of the script up to the line…

-- Separate the numeric part and prefix from the document's title

So I went ahead and did this, but have hit an obstacle.

The code below will generally work, but it is not as precise as copy with source link, because it lacks the start parameter which is supposed to represent the precise position of the first character of the quoted text on the page that it begins on. How can I get that in AppleSript?

tell application id "DNtp"
	try
		-- Retrieve record details directly
		set theRecord to content record of think window 1
		set pageNumber to current page of think window 1
		set theContent to plain text of theRecord
		set theUUID to uuid of theRecord
		set theTitle to name of theRecord
		set quotedText to (selected text of think window 1) as string
		set sourceLink to reference URL of theRecord
	
		
		-- Separate the numeric part and prefix from the document's title
		set numericPart to my extractNumericPart(theTitle)
		set prefix to my extractPrefix(theTitle) -- Prefix before the numeric part (e.g., "NOAA")
		set totalLength to the length of theTitle
		set prefixLength to the length of prefix
		set desiredNumericLength to totalLength - prefixLength -- Desired length for the numeric part of Bates number
		
		-- Calculate the Bates number considering the page offset
		set batesNumber to (numericPart as number) + pageNumber
		set batesFormatted to my formatBatesNumber(batesNumber, prefix, desiredNumericLength) -- Format with leading zeros
		
		-- rely on perl for URL encoding
		set theText to quotedText
		set encodedText to do shell script "perl -MURI::Escape -e 'print uri_escape(q{" & theText & "});'"
		
		-- Calculate the precise length
		-- set theLength to count characters of quotedText
		
		-- set newSourceLink to "x-devonthink-item://" & theUUID & "?" & my joinQueryParameters(queryParameters)
		set markdownLink to "[" & batesFormatted & "](" & sourceLink & "?page=" & pageNumber & "&search=" & encodedText & ")"
		
		-- Compile the final clipboard content and update the clipboard
		set finalClipboardContent to quotedText & " " & markdownLink & "."
		set the clipboard to finalClipboardContent
		
	on error errMsg
		display dialog "Error: " & errMsg -- Show any errors that occur
	end try
end tell

-- Formatting the Bates number with leading zeros and adding prefix
on formatBatesNumber(batesNumber, prefix, desiredNumericLength)
	set batesStr to batesNumber as text
	repeat until length of batesStr is equal to desiredNumericLength
		set batesStr to "0" & batesStr -- Pad with zeros to match the desired length
	end repeat
	return prefix & batesStr -- Return formatted Bates number with prefix
end formatBatesNumber

-- Extract numeric part from the document title
on extractNumericPart(title)
	set numericString to ""
	repeat with aChar in characters of title
		if aChar is in "0123456789" then set numericString to numericString & aChar
	end repeat
	return numericString -- Preserve leading zeros by returning as string
end extractNumericPart

-- Extract prefix (non-numeric part) from the document title
on extractPrefix(title)
	set prefix to ""
	repeat with aChar in characters of title
		if aChar is in "0123456789" then exit repeat
		set prefix to prefix & aChar
	end repeat
	return prefix
end extractPrefix

-- Combine query parameters back into a single string for the link
on joinQueryParameters(parameters)
	set AppleScript's text item delimiters to "&"
	return parameters as string
end joinQueryParameters

Just use the reference URL property of the window/tab as suggested instead of the property of the record.

And lose the text-specific link precision? I don’t think so! I’d rather revert to my interface menu parsing method.

Did you actually have a look at the different results of these properties?

Yes, after realizing I’d again misunderstood your message. it works great! Many thanks, cheers,


tell application id "DNtp"
	try
		-- Retrieve record details directly
		set theRecord to content record of think window 1
		set pageNumber to current page of think window 1
		set theContent to plain text of theRecord
		set theUUID to uuid of theRecord
		set theTitle to name of theRecord
		set quotedText to (selected text of think window 1) as string
		set sourceLink to reference URL of think window 1
		
		
		-- Separate the numeric part and prefix from the document's title
		set numericPart to my extractNumericPart(theTitle)
		set prefix to my extractPrefix(theTitle) -- Prefix before the numeric part (e.g., "NOAA")
		set totalLength to the length of theTitle
		set prefixLength to the length of prefix
		set desiredNumericLength to totalLength - prefixLength -- Desired length for the numeric part of Bates number
		
		-- Calculate the Bates number considering the page offset
		set batesNumber to (numericPart as number) + pageNumber
		set batesFormatted to my formatBatesNumber(batesNumber, prefix, desiredNumericLength) -- Format with leading zeros
		
		-- rely on perl for URL encoding
		set theText to quotedText
		set encodedText to do shell script "perl -MURI::Escape -e 'print uri_escape(q{" & theText & "});'"
		
		
		-- set newSourceLink to "x-devonthink-item://" & theUUID & "?" & my joinQueryParameters(queryParameters)
		set markdownLink to "[" & batesFormatted & "](" & sourceLink & ")"
		
		-- Compile the final clipboard content and update the clipboard
		set finalClipboardContent to quotedText & " " & markdownLink & "."
		set the clipboard to finalClipboardContent
		
	on error errMsg
		display dialog "Error: " & errMsg -- Show any errors that occur
	end try
end tell

-- Formatting the Bates number with leading zeros and adding prefix
on formatBatesNumber(batesNumber, prefix, desiredNumericLength)
	set batesStr to batesNumber as text
	repeat until length of batesStr is equal to desiredNumericLength
		set batesStr to "0" & batesStr -- Pad with zeros to match the desired length
	end repeat
	return prefix & batesStr -- Return formatted Bates number with prefix
end formatBatesNumber

-- Extract numeric part from the document title
on extractNumericPart(title)
	set numericString to ""
	repeat with aChar in characters of title
		if aChar is in "0123456789" then set numericString to numericString & aChar
	end repeat
	return numericString -- Preserve leading zeros by returning as string
end extractNumericPart

-- Extract prefix (non-numeric part) from the document title
on extractPrefix(title)
	set prefix to ""
	repeat with aChar in characters of title
		if aChar is in "0123456789" then exit repeat
		set prefix to prefix & aChar
	end repeat
	return prefix
end extractPrefix


-- Combine query parameters back into a single string for the link
on joinQueryParameters(parameters)
	set AppleScript's text item delimiters to "&"
	return parameters as string
end joinQueryParameters

Glad to help.