Updating metadata keywords in a PDF record

Just sharing this to add a little back to the community.

If you want to update the metadata keywords(or other fields) that are actually stored inside a PDF stored in DT this is the most efficient way I’ve found of doing so.

on NSdataFromASdata(asData)
	return (current application's NSArray's arrayWithObject:asData)'s firstObject()'s |data|()
end NSdataFromASdata

on ASdataFromNSdata(nsData)
	set theCode to current application's NSHFSTypeCodeFromFileType("'rdat'")
	return (current application's NSAppleEventDescriptor's descriptorWithDescriptorType:theCode |data|:nsData) as data
end ASdataFromNSdata


set pdfDoc to (current application's PDFDocument's alloc()'s initWithData:(my NSdataFromASdata(data of theRecord)))
set PDFAttributes to current application's NSMutableDictionary's dictionaryWithDictionary:(pdfDoc's documentAttributes())
PDFAttributes's setValue:(documentKeywords as list) forKey:"Keywords"
set pdfDoc's documentAttributes to PDFAttributes
set data of theRecord to (my ASdataFromNSdata(pdfDoc's dataRepresentation()))

If you just update the PDF directly outside DT, DT is unaware of the change. This approach ensures DT indexes the changes as they are made. You can use the same approach to update other fields in the PDF metadata.

Parts of this are shamelessly lifted from Convert NSData into raw data string - #4 by ShaneStanley - AppleScript - Late Night Software Ltd.

1 Like

If I understand the code correctly, it

  • gets the current PDF metadata as the PDFDocument’s documentAttributes in a NSMutableDictionary
  • It then sets the keywords key in this dictionary to a list of documentKeywords
  • and replaces the documentAttributes with this modified dictionary.

I suppose that this overwrites any previously defined keywords instead of updating (i.e. merging) them, but I may be wrong.

In any case, I think the code could be made much simpler (and forego ASdataFromNSData as well as NSdataFromASData by

  • creating the PDFDocument from theRecord’s path property through getting an URL from that and then calling initWithURL for the PDFDocument
  • After setting the documentAttributes, write out the PDFDocument with writeToPath

Might be a tad faster, too :wink:

What do you mean by “outside DT” – with a script, in an app?

That is correct, in my use case I wanted to replace them. But its simple for somebody else to change if required.

If you go via the initWithURL and update the PDF file DT does not detect the change. If you view the file in DT or via other tools the Keywords have been updated, but if you search for those keywords DT doesn’t find them.

Did you try running DT’s indicate method after writing the PDF? That is supposed to update the index.

I did try ‘indicate’ but it didn’t pick up on the keyword changes in the PDF metadata.

Weird. Perhaps @cgrunenberg has an idea.

indicate is used to index files/folders, synchronize is the right command.

Unsurprisingly, you are correct. Thanks a bunch.

Here’s a JavaScript version of @MrSkooby’s script, following the path I suggested before:

ObjC.import('PDFKit');
(() => {
  const app = Application("DEVONthink 3")
  /* Get the record to add keywords to */
  const rec = app.getRecordWithUuid('x-devonthink-item://83E31EAE-991E-41A3-A146-20E4A77295E8');

  /* create a PDFDocument from the record's path converted to an URL */
  const PDFDoc = $.PDFDocument.alloc.initWithURL($.NSURL.fileURLWithPath($(rec.path())));

/* get the documentAttributes as a mutable array so we can add keywords to it */
  const PDFAttributes = $.NSMutableDictionary.dictionaryWithDictionary(PDFDoc.documentAttributes);

  /* Set the "Keywords" entry of the PDFAttributes to something unique 
     for testing purposes */
  PDFAttributes.setValueForKey($([$('blurbsel')]), $('Keywords'));

  /* Update the PDF's documentAttributes */
  PDFDoc.documentAttributes = PDFAttributes;

  /* Write back the PDF */
  PDFDoc.writeToFile($(rec.path()));

  /* Force DT to update it's index re this record */
  app.synchronize({record: rec});
})()

The setValueForKey call is a bit weird because it needs Objective-C data structures. $([…]) converts a JavaScript array into an NSArray, while $('…') converts a JavaScript string into an NSString. Therefore, $([$('blurbsel')]) is an NSArray with an NSString as its only element.

I was (or I think I was) using indicate against the path property of a record. The same path I used to update the PDF file when I was testing the initWithURL route. But it didn’t appear to work. All searches for the newly added keywords failed.

Unfortunately I didn’t spot synchronise, or if I did I assumed it was related to the “sync” process. Oh well learned something new!

Thanks.