Script to look up PDF document metadata on crossref.org

I hate AppleScript. Really hate it. But I love Devonthink, and had an itch to scratch. This is somewhat similar to other scripts that do bibliographic lookups, but it uses crossref.org.

Feedback, etc. welcome.

4 Likes

Looks like an interesting script. However, as it requires either a text selection or a certain name I wonder what kind of input crossref.org accepts/expects.

By the way, do your documents contain a DOI (digital object identifier)? Then the smart rule script Download Bibliographic Metadata might be useful too.

Okay, I should obviously read the comments how to use it first :slight_smile:

It uses crossref’s query.bibliographic, which allows pretty broad input; authors, ISBNs, titles, etc.

My workflow when I download papers is to usually to set the name of the paper using control-command-I (awesome feature, was so happy when I found that), that’s why it takes the name if you don’t select any text.

Some papers contain DOIs, but many don’t :frowning: I might make it a bit more nuanced if there’s existing metadata…

For hating AppleScript, it looks like you made something useful… and I hope that itch is gone now :wink:

2 Likes

Hi, I followed your instructions (I think!), but when I select the script in DT, nothing happens?

Since I’m not particulary fond of AS either, I took the liberty to rewrite the script in JavaScript. It seems to work (tested it with one record only, though). Also, I stumbled upon results from the API call without a title field, the script then fills in a stupid placeholder. Don’t know if that is reasonable or not.
Here goes

(() => {
  const app = Application("DEVONthink 3");
  /* Need currentApplication() for user interaction */
  const curApp = Application.currentApplication();
  curApp.includeStandardAdditions=true;
  /* Basic error checking */
  const thinkWindow = app.thinkWindows();
  const contentRec = app.contentRecord();
  if (!thinkWindow|| !thinkWindow[0]) throw "No window open";
  if (!contentRec) throw "No document selected";

  /* Query for either selected text or the name of the record if no text is selected */
  const query = thinkWindow[0].selectedText() || contentRec.name();
  const apiURL = 'https://api.crossref.org/works';
  const shellCmd = `curl -A "(https://gist.github.com/mnot/0d7825bde9b9d3233f623c71765f20ca)" -G ${apiURL} --data-urlencode query.bibliographic='${query}' -d rows=5 -d select=author,title,created,type,publisher,published,subject`;
  const apiResult = JSON.parse(curApp.doShellScript(shellCmd));

/* Basic error checking for the return value of the API call */
  if (apiResult.status !== "ok") {
    throw "API response not OK: " & apiResult.message;
  }
  const itemList = apiResult.message.items;
  if (itemList.length === 0) {
    curApp.displayAlert("No matches found!");
	return;
  }

  /* Arrays to save the dates and authors, no need to extract them twice */
  const choices = [], dates = [], authors = [];

  /* Build list of choices from results */
  itemList.forEach(item => {
	const title = item.title ? item.title[0] : "No Title?";
    let detailString = extractAuthor(item);
	authors.push(detailString);
	const dateString = extractDate(item);
	dates.push(dateString);
	detailString += `${detailString ? ', ' : ''}${dateString}`;
	choices.push(`${title} (${detailString})`);
  })
  const selection = curApp.chooseFromList(choices, {withPrompt: "Select:"});
  if (!selection) return;
  
  /* User selected fromt the list of references: get the selected index */
  const selectedIndex = choices.indexOf(selection[0]);

  /* get the corresponding item from the API result */
  const selectedItem = apiResult.message.items[selectedIndex];

  /* get the corresponding date */
  const selectedDate = dates[selectedIndex].split('-');
  selectedDate[1]--; /* months are 0-indexed in JS! */
  
  /* Set the record's date to the date of the selected item */
  
  contentRec.date = new Date(...selectedDate);

  /* add the 'type' field from the result to the record's tags */
  const typeTag = selectedItem.type;
  const recordTags = contentRec.tags();
  if (recordTags.indexOf(typeTag) === -1) {
    recordTags.push(typeTag);
	contentRec.tags = recordTags;
  }
  
  /* Set PDF metadata fields Titel and Author */
  const recordPath = contentRec.path();
  setPDFMetadata(curApp, "Title", selectedItem.title[0], recordPath);
  setPDFMetadata(curApp, "Author", authors[selectedIndex], recordPath);
})()


function setPDFMetadata(curApp, key, value, path) {
  const pathToExiftool = '/usr/local/bin/exiftool';
  const shellCommand = `${pathToExiftool} -overwrite_original -${key}='${value}' '${path}'`;
  curApp.doShellScript(shellCommand);
}

/* Extract the first author from one item of the API result */
function extractAuthor(item) {
  if (!item.author) return "";
  const firstAuthor = item.author[0];
  let authorString = `${firstAuthor.given || ""} ${firstAuthor.family || ""}`;
  return authorString.trim();
}

/* extract the published date from the API result. 
If it does not exist or is not complete, use create date */
function extractDate(item) {
  let year, month, day;
  if (item.published && item.published['date-parts']) {
    [year, month, day] = item.published['date-parts'][0];
  } 
  /* if one of year, month or day are still undefined, get values from 'created' field */
  if (!(year && month && day)) {
    [year, month, day] = item.created['date-parts'][0];
  }   
  return `${year}-${month}-${day}`;
}  

Thanks for sharing the script @mnot.

I’m encountering the same issue as @extracampine after following the installation instructions.

Has anyone found the cause or, even better, a way to fix this?

Is anything reported in DT’s log window?

Nothing is logged and there is no visible sign that anything is happening otherwise.

You could run the script from script editor and open its message area. Then you’ll see all Apple everts happening (or not) which might give some ideas as to what’s going on (or not)

That was a great hint, thanks!
The script shows the selection dialogue as expected if I run it through script editor. If I instead run the same script by clicking on the entry in DT’s script menu, nothing happens. Will try some stuff and report back if something solves it.

// Edit: It’s working now in DT! In case it helps someone, these are the steps that seem to have worked:

Apparently homebrew does not always automatically install applications to the usr/local/bin path - however, this is the path referenced in the script. After not finding Exiftool there although it was already installed, I found it in my homebrew install directory and changed the path in the script accordingly. Then, after running the script through Script Editor once and then trying again in DT, I got the below message and confirmed. It’s now fully functional - looking forward to further testing.

Good to see that it works now. Bad to see that Apple(Script) doesn’t throw an error when it can’t find the external program.

And another argument for going with self-contained scripts as much as possible. The same holds, btw, for JSONHelper. If one used JavaScript instead of AppleScript, this tools were not necessary at all. Less stuff to install, less cruft, less problems.

It would actually be possible to set the PDF metadata using Objective-C from the script, avoiding exiftool completely. Probably more elegantly, too, than calling exiftool for Title and Author separately.

I’ve updated the script, including the installation instructions; see how that goes for you (note especially the osacompile step).