Script: Add page label to PDF annotations

In a recent thread, a user asked about the possibility to add the page label to a PDF annotation: In some documents, the printed page number (label) does not coincide with the logical number. For example, if the document is part of a larger publication or if the document’s page number start with Roman numerals for the introduction…

The following script works with a selection of annotation files in Markdown format. The corresponding PDF documents have to be available in DT, too. When run, the script extracts the UUID of the PDF document from the annotations file and loads that file internally. For every page reference in the annotations file, it then looks at the page in the document and extracts its page label. If that is different from the page number in the annotation file, it adds the label in parentheses to the annotation. So,

[Page 5](x-devonthink-item://...) 

becomes

[Page 5 (72)](x-devonthink-item://...)

if the fifth page of the document is labelled “72”. If there are no page labels defined, or they are identical to the logical page number, the annotation is not modified.

ObjC.import('PDFKit');
ObjC.import('Foundation');
(() => {
	const app = Application('DEVONthink 3');
	const records = app.selectedRecords();
	records.forEach(r => {
		let txt = r.plainText();
		const uuidMatch = txt.match(/^(?:#\s+|\{\+\+\*\*\[).*\(x-devonthink-item:\/\/(.*?)\)/);
		const uuid = (uuidMatch && uuidMatch[1]) || undefined;
		if (!uuid) return; /* ignore files without a UUID in the first headline */
        /* Load the PDF document using the path to it */
		const PDFrec = app.getRecordWithUuid(uuid);
		const PDFpath = PDFrec.path();
		const PDFdoc = $.PDFDocument.alloc.initWithURL($.NSURL.fileURLWithPath($(PDFpath)));

		/* get all page numbers in the annotation file into an array */
		const pageRE = new RegExp(/(?:##\s+)?\[Page\s+(\d+)/,'gs');
		const pagesArray = new Set([...txt.matchAll(pageRE)].map(m => m[1]));
		const pageMapping = [];
             
        /* Loop over all page numbers in the annotation file to
           define the mapping to page labels */
		pagesArray.forEach(page => {
           /* Get the corresponding page from the PDF document. 
              The first document page is number 0 */
			const PDFpage = PDFdoc.pageAtIndex(page-1);
            /* Get the page label or undefined if it's not set */
			const pageLabel = (() => {
				try {
				return PDFpage.label.js;
			} catch {
				return undefined;
			};
			})();
			/* Save label for this page */
			pageMapping[page] = pageLabel;
		})

        /* Loop over all page number/page label pairs and 
           modify annotation if the two are different */
		for (let [page, pageLabel] of Object.entries(pageMapping))
          /* do nothing if the page label 
             is undefined or the same as the page number 
             Note the usage of '==' as comparison operator: 
                  It casts its operands to the same type */
			if (!pageLabel || (pageLabel == page)) return; 
			/* Build the regular expression to find all references to the current 'page' */
			const findPageRE = new RegExp(`(?:##\\s+)?\\[Page\\s+${page}`,'gs');
			/* add the page label in parenthesis behind the page */
			txt = txt.replaceAll(findPageRE, `$& (${pageLabel})`);
		}
        // Uncomment the next line to see the modified annotations in DT's log window
		//app.logMessage(txt);
        // Comment the next line to prevent modification of the annotation document
        r.plainText = txt;
	})
})()

The script relies on the DT item link to the PDF being available in either

  • the first level one headline of the annotation file (i.e. the first line beginning with # , or
  • the first line beginning with {++**[.

This can be changed by adding more/other regular expressions to the line
uuidMatch = txt.match....

You should first test if the script does what you want by making copies of the annotation files, selecting those and then run the script on them. Alternatively, you could comment out the line
r.plainText = ...
and remove the comment before
app.logMessage…
Then the modified annotations will be written to DT’s log window without any changes to the annotations file itself.

2 Likes