I want to post a script that I hope may be helpful for some other DT users out there. The script fetches bibliographic metadata for scientific papers from the NASA / Harvard Astrophysics Data System (ADS). If you are working in astronomy or related fields, even as a hobbyist, you will be very familiar with the ADS. For many historical papers, it is the only source for digital copies.
I found myself with a big collection of these historical PDFs and wanted to have an easy way to get bibliographic metadata as well as tags into my database. The attached script is tailored to exactly this situation: You have PDFs from the ADS and a list of custom metadata fields. The script even allows localized or user-defined field names – the downside is that you will have to edit two property fields before being able to use it, one for this fieldname mapping, the other for your API key (which you can get for free by creating an account at https://ui.adsabs.harvard.edu ).
I have to admit that I did this mostly because I had no previous experience with AppleScript, so I used Gemini as an LLM-based code assist tool. Which was quite helpful, because I found out on the way that AppleScript seemingly can’t really handle JSON very well and can’t wrangle details from within PDFs. So in order to parse JSON within Applescript and detect links in PDFs, the script uses some rather annoying and unreadable ObjectiveC-based tricks and became huge… Because of this, it is quite possible that there are errors or stylistic catastrophes left in the code. It works fine for its intended purpose, however.
I would be very happy about any feedback! However, since I have very little time currently, I can’t promise I’ll be able to incorporate any suggestions in a reasonable timeframe, though.
use framework "Foundation"
use framework "PDFKit"
use scripting additions
-- ---------------------------------------------------------------------------------
-- Customize these two properties to match your own setup.
(* This record maps the field names from the ADS API JSON response
to the custom metadata field names you have defined in DEVONthink.
The keys are the JSON field names as used in the API and documented at
https://ui.adsabs.harvard.edu/help/api/api-docs.html#servers, and
the values are your localized / custom DEVONthink field names.
*)
property fieldMapping : {¬
{"title", "Originaltitel"}, ¬
{"author", "Autoren"}, ¬
{"pubdate", "Datum"}, ¬
{"pub", "Erschienen in"}, ¬
{"volume", "Volume"}, ¬
{"year", "Erscheinungsjahr"}, ¬
{"page", "Page"}, ¬
{"doi", "DOI"}, ¬
{"isbn", "ISBN"} ¬
}
(* Paste your API token into this string. *)
property api_token : ""
-- ---------------------------------------------------------------------------------
(*
=================================================================================
SCRIPT DOCUMENTATION
=================================================================================
WHAT THIS SCRIPT DOES:
This script automates adding metadata to academic papers from the NASA Astrophysics Data System (ADS)
within DEVONthink. It performs the following steps:
1. It inspects the selected document to find its unique 19-character "bibcode". It can find the bibcode from the
document's filename, from an ADS URL, or by scanning the content of a PDF for a watermark link.
2. It uses this bibcode to query the official ADS API to retrieve detailed metadata about the paper.
3. It parses the API's response and populates your custom / localized metadata fields in DEVONthink (as defined in the
`fieldMapping` property).
4. It sets the document's URL to the ADS abstract page and adds any keywords from the API as DEVONthink tags.
WHAT IS THE ADS?
The NASA Astrophysics Data System (ADS) is a digital library and online database of scientific papers with a strong
focus on astronomy and astrophysics. See https://ui.adsabs.harvard.edu/ or, since the UI is currently transitioning
to a new version, https://scixplorer.org.
IMPORTANT LIMITATIONS:
This script is specifically designed to work with documents that have an ADS bibcode. It will **not** work for papers
downloaded directly from other sources like arXiv, journal publisher websites (e.g., Elsevier, Springer), or other
academic repositories, even if the papers are *also* in the ADS -- the script requires bibcodes to be present.
SAMPLE API RESPONSE:
The script expects a JSON response from the API. The `fieldMapping` property is used to map the keys from this JSON
(e.g., "title", "author") to your custom fields in DEVONthink. A typical response for a single document looks like this:
$ curl -H "Authorization: Bearer ..." "https://api.adsabs.harvard.edu/v1/search/query?q=bibcode:1995ApJ...438...62W&fl=title,issue,keyword,pub,title,volume,year,pubdate,author"
{
"responseHeader":{
"status":0,
"QTime":6,
"params":{
"q":"bibcode:1995ApJ...438...62W",
"fl":"title,issue,keyword,pub,title,volume,year,pubdate,author",
"start":"0",
"internal_logging_params":"X-Amzn-Trace-Id=Root=1-692213f1-0c85a8155426ae546f3e1840",
"rows":"10",
"wt":"json"}},
"response":{"numFound":1,"start":0,"numFoundExact":true,"docs":[
{
"author":["Wilson, A. S.",
"Colbert, E. J. M."],
"keyword":["Active Galactic Nuclei",
"Black Holes (Astronomy)",
"Luminosity",
"Radio Jets (Astronomy)",
"Cosmology",
"Interacting Galaxies",
"Quasars",
"Radio Astronomy",
"Astrophysics",
"BLACK HOLE PHYSICS",
"GALAXIES: ACTIVE",
"GALAXIES: INTERACTIONS",
"GALAXIES: NUCLEI",
"GALAXIES: QUASARS: GENERAL",
"RADIO CONTINUUM: GALAXIES",
"Astrophysics"],
"pub":"The Astrophysical Journal",
"pubdate":"1995-01-00",
"title":["The Difference between Radio-loud and Radio-quiet Active Galaxies"],
"volume":"438",
"year":"1995"}]
}}
*)
tell application id "DNtp"
if api_token is "" then
display dialog "No API Token Found" & return & return & ¬
"This script queries the ADS / SciX API, which requires an API token to function." & ¬
"Register for a free account at https://scixplorer.org and create a token in your account settings." & ¬
"After that, edit this script and paste your token into the 'api_token' property." & return & return & ¬
"The script is located at:" & return & (path to me as string) ¬
buttons {"OK"} default button "OK" with icon stop
end if
set theDocument to item 1 of (get selection)
set theName to name of theDocument
set detectedBibcode to ""
-- Discover the Bibcode of the selected Document. There are three possibilites:
-- 1. Downloaded documents will have a bibcode as their name. That's easy.
if my isBibcode(theName) then
set detectedBibcode to my decodeString(theName)
-- 2. Files may have an ADS URL as their name.
else if "adsabs.harvard.edu/" is in theName then
set rawBibcode to my getRawBibcodeFromADSURL(theName)
set detectedBibcode to my decodeString(rawBibcode)
-- 3. the user seems to think that this object was downloaded from ADS nonetheless,
-- so we'll scan for the watermark link that every modern ADS PDF has
else if type of theDocument is PDF document then
set foundURL to my scanPDFforADSLink(path of theDocument)
if foundURL is not "" then
set detectedBibcode to my getRawBibcodeFromADSURL(foundURL)
set detectedBibcode to my decodeString(detectedBibcode)
end if
end if
-- If we have a valid bibcode, query the API and set the metadata
if my isBibcode(detectedBibcode) then
set encodedBibcode to my encodeString(detectedBibcode)
my updateMetadataFromAPI(encodedBibcode, theDocument)
else
display dialog "Could not find a valid Bibcode in the selected document."
end if
end tell
(**
* Checks if a given string is a valid 19-character Bibcode.
* There is no single Regex to check this due to the limitatations in https://ui.adsabs.harvard.edu/help/actions/bibcode
* (and AppleScript's regex handling is not great) so let's just use super simple checks here --
* worst case is an empty result set from the API.
*
* @param bibcodeString The string to validate.
* @return true if the string is 19 characters long, false otherwise.
*)
on isBibcode(bibcodeString)
-- First, check for the correct length. This is the fastest check.
if (count of bibcodeString) is not 19 then
log "Bibcode failed length check: " & bibcodeString & " with length: " & (count of bibcodeString) as string
return false
end if
-- Apple Script does not seem to have simple regex handling, sooo....
set allowedChars to "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789.&+!-"
repeat with aChar in (characters of bibcodeString)
if aChar is not in allowedChars then
log "Found invalid character in Bibcode: " & aChar
return false
end if
end repeat
return true
end isBibcode
(**
* Converts a full ADSABS URL into a 19-character Bibcode.
* It extracts the bibcode, URL-decodes it, and validates its length.
* There are two URL variants: Those starting with adsabs.harvard.edu/abs/ (which is used in the embedded
* watermark-style links in PDFs) and those starting with adsabs.harvard.edu/pdf/
* (when storing a PDF directly from the download page, DT uses this as the filename).
*
* @param urlStr The full URL string, e.g., "https://adsabs.harvard.edu/abs/2023A%26A...678A.135H".
* @return The 19-character Bibcode as a string, or an empty string if conversion fails or length is incorrect.
*)
on getRawBibcodeFromADSURL(urlStr)
-- To handle both "adsabs.harvard.edu/abs/" and "adsabs.harvard.edu/pdf/" URLs,
-- we can split the URL by "/" and take the last item, which will be the bibcode.
set oldDelimiters to AppleScript's text item delimiters
set AppleScript's text item delimiters to "/"
try
set rawBibcode to the last text item of urlStr
on error
log "Error while splitting URL: " & urlStr
set AppleScript's text item delimiters to oldDelimiters
return ""
end try
set AppleScript's text item delimiters to oldDelimiters
return rawBibcode
end getRawBibcodeFromADSURL
(**
* Find an ADS watermark.
* Watermarked documents have a vertical link beginning with "https://adsabs.harvard.edu/abs/" on every page.
* This method looks at PDF link annotations and tries to find one starting with that string.
*
* @param pdfPath The full POSIX path to the PDF file (e.g., from DEVONthink's 'path of theDocument').
* @return The found URL as a string, or an empty string if no matching link is found.
*)
on scanPDFforADSLink(pdfPath)
-- Create a PDFDocument object from the file path using PDFKit
try
set theURL to current application's NSURL's fileURLWithPath:pdfPath
set thePDF to current application's PDFDocument's alloc()'s initWithURL:theURL
if thePDF is missing value then
log "Error: Could not open the PDF file at path: " & pdfPath
return ""
end if
on error errMsg
log "Error creating PDF object: " & errMsg
return ""
end try
-- Check PDF for links via annotations
-- (only the first page -- if there is an ADS link block, it's present on every page)
set pageCount to thePDF's pageCount()
if pageCount > 0 then
set thePage to (thePDF's pageAtIndex:0) -- Get the first page (index 0)
set theAnnotations to thePage's annotations()
-- is one of the annotations a Link to adsabs.harvard.edu/ads/? If so, return it
repeat with anAnnotation in theAnnotations
if (anAnnotation's type() as string) is equal to "Link" then
set urlString to (anAnnotation's |URL|()'s absoluteString()) as string
if "adsabs.harvard.edu/abs/" is in urlString then
return urlString
end if
end if
end repeat
end if
return ""
end scanPDFforADSLink
(**
* Percent-encodes a string for safe use in URLs.
*
* @param theString The string to encode.
* @return The URL-encoded string.
*)
on encodeString(theString)
set NSString to (current application's NSString's stringWithString:theString)
set allowedChars to (current application's NSCharacterSet's URLQueryAllowedCharacterSet)
set encodedString to (NSString's stringByAddingPercentEncodingWithAllowedCharacters:allowedChars)
return encodedString as string
end encodeString
(**
* Decodes a percent-encoded string from a URL.
*
* @param theString The string to decode.
* @return The decoded string, or an empty string on failure.
*)
on decodeString(theString)
set NSString to (current application's NSString's stringWithString:theString)
set decodedString to (NSString's stringByRemovingPercentEncoding())
if decodedString is missing value then
return ""
end if
return decodedString as string
end decodeString
(**
* Queries the ADS API with a given bibcode, parses the response, and updates a DEVONthink record with the retrieved metadata.
*
* This handler performs the core logic of fetching data from the ADS API and populating the fields of a DEVONthink record.
* It constructs the API request, adds the necessary authorization header using the `api_token` property, and sends the request.
* Upon receiving a valid response, it parses the JSON and iterates through the `fieldMapping` property to match API fields
* to DEVONthink custom metadata fields. It handles multi-valued fields (like authors and keywords) by joining them into a
* single string. Finally, it sets the record's main URL to the ADS abstract page and applies any found keywords as tags.
*
* @param theBibcode The URL-encoded 19-character bibcode string for the document.
* @param theRecord A reference to the DEVONthink record that will be updated.
*)
on updateMetadataFromAPI(theBibcode, theRecord)
-- Construct the query URL from the bibcode
set requestURLString to "https://api.adsabs.harvard.edu/v1/search/query?q=bibcode:" & theBibcode & "&fl=title,author,pubdate,pub,volume,year,page,doi,isbn,keyword"
set requestURL to current application's NSURL's URLWithString:requestURLString
-- Create a mutable request to add the authorization header
set theRequest to current application's NSMutableURLRequest's requestWithURL:requestURL
theRequest's setValue:("Bearer " & api_token) forHTTPHeaderField:"Authorization"
-- Perform the request
set {theData, theResponse, theError} to current application's NSURLConnection's sendSynchronousRequest:theRequest returningResponse:(reference) |error|:(reference)
-- Parse the JSON response from the API and put metadata into DEVONthink
if theData is not missing value then
set jsonString to (current application's NSString's alloc()'s initWithData:theData encoding:(current application's NSUTF8StringEncoding)) as string
try
set theJSON to (current application's NSJSONSerialization's JSONObjectWithData:theData options:0 |error|:(missing value))
set docsKey to (theJSON's valueForKeyPath:"response.docs")
if (docsKey's |count|()) > 0 then
set theDocument to (docsKey's objectAtIndex:0)
-- Extract keywords for later use as tags. This is an array.
set theKeywords to (theDocument's valueForKey:"keyword")
-- All other response fields go into custom metadata fields:
repeat with aMapping in fieldMapping
set jsonKey to item 1 of aMapping
set theValue to (theDocument's valueForKey:jsonKey)
set finalValue to ""
set dtFieldName to item 2 of aMapping
if theValue is not missing value then
-- Arrays (like authors or keywords) are joined to serialize them into a String.
if (theValue's isKindOfClass:(current application's NSArray's |class|())) then
set finalValue to (theValue's componentsJoinedByString:"; ") as string
else
set finalValue to theValue as string
end if
-- Add the processed value to DEVONthink's custom metadata.
tell application id "DNtp"
add custom meta data finalValue for dtFieldName to theRecord
end tell
end if
end repeat
-- Set DEVONthink's URL field to the paper's ADS landing page as that is universal
tell application id "DNtp"
set (url of theRecord) to ("https://adsabs.harvard.edu/abs/" & theBibcode)
end tell
-- If we found keywords, set them as tags in DEVONthink.
if theKeywords is not missing value then
tell application id "DNtp"
set (tags of theRecord) to (theKeywords as list)
end tell
end if
end if
on error errMsg
log "JSON Parsing Error: " & errMsg
end try
end if
end updateMetadataFromAPI







