Get "real" PDF Date Properties?

I think I’m missing something. Whenever I import PDFs, it does not import the true PDF properties. I am most concerned about creation and modification date.
When I import the PDF (via drag and drop from Finder), it usually, but not always, has the download date for created and modified. When I open the PDF in any PDF reader (tested on Adobe Reader, Google Drive, and NitroPDF), whether before or after import, the “true” dates are there.
How can I extract that and make it the real created date, especially on import or a mass update with an AppleScript?
I do a lot of bulk downloads and imports and it’s important to get the dates right so I can figure out what’s the most relevant info.

Welcome @jliebster

DEVONthink gets its dates from the operating system. I’m not sure what you feel is the true date and where you’re viewing it.

If I open it up in a PDF viewer like Reader, Preview, Nitro PDF, or even Google Drive, if you get info on the PDF, it shows something completely different than what DEVONthink shows.

For example, if I download this NIST document mentioned below (sorry, this platform won’t let me post the URL, but it’s in the screenshot), and import via any method (use the Chrome plugin to import as paginated PDF, or download and drag-n-drop), it has all the dates of today. See the two methods here, respectively:

However, if you take that same downloaded file and open it up in the apps mentioned above or just put the URL in Chrome and get Document Properties, it shows the “true” dates.

These do not appear anywhere in what is imported, although occasionally in the Modified Date.

It would be nice if DT could read that info and use it in the import process.

Unless I am doing import all wrong.

DEVONthink gets this information from the Finder.
I just downloaded that PDF and here it is in the Finder…

Right, so it looks like I would need to find an AppleScript that would “scrape” that data from within the PDF and replace the creation date?

That’s a possibility.

Development would have to assess a change in behavior.

Not necessarily AppleScript, but apparently that’s it. A simple JavaScript example that shows the PDF metadata for the currently selected record in DT:

"use strict";
ObjC.import('Foundation');  
ObjC.import('PDFKit');

( () => {
  const DT = Application("DEVONthink 3");
  const rec = DT.selectedRecords()[0];
  const path = rec.path();
  const dir = $();
  const URL = $.NSURL.fileURLWithPathIsDirectory($(path), dir);
  const PDFDoc = $.PDFDocument.alloc.initWithURL(URL);
  const attributes = PDFDoc.documentAttributes.js;
  Object.keys(attributes).forEach(k => console.log(`Key ${k} - Value: ${attributes[k].js}`))
})()

Save it in a file like “script.js” and then run
osascript -l JavaScript script.js
in the Terminal (after you have selected the PDF record in DT, of course). For the example, I get:

Key Author - Value: National Institute of Standards and Technology
Key Creator - Value: Microsoft® Word 2013
Key CreationDate - Value: Tue Apr 17 2018 19:45:53 GMT+0200 (CEST)
Key Producer - Value: Microsoft® Word 2013
Key ModDate - Value: Tue Apr 17 2018 20:50:14 GMT+0200 (CEST)
Key Title - Value: Framework for Improving Critical Infrastructure Cybersecurity, Version 1.1
Key Keywords - Value: [id __NSCFString]
Key Subject - Value: This publication describes a voluntary risk management framework (“the Framework”) that consists of standards, guidelines, and best practices to manage cybersecurity-related risk.  The Framework’s prioritized, flexible, and cost-effective approach helps to promote the protection and resilience of critical infrastructure and other sectors important to the economy and national security.

This release, Version 1.1, includes a number of updates from the original Version 1.0 (from February 2014), including: a new section on self-assessment; expanded explanation of using the Framework for cyber supply chain risk management purposes; refinements to better account for authentication, authorization, and identity proofing; explanation of the relationship between implementation tiers and profiles; and consideration of coordinated vulnerability disclosure. Complete information about the Framework is available at https://www.nist.gov/cyberframework.

which shows the creation and modification dates that you see in the other PDF programs, too.

Starting from that, it should be possible to set the creation and modified data values for the DT record. Though that might require some acrobatics to get the dates into a format liked by JavaScript as well as DT.

That could be achieved with something like the following code:

/* Create a date formatter for ISO dates */
  const formatter = $.NSISO8601DateFormatter.alloc.init;
/* Get the modification date from the PDF */
  const modDate = attributes['ModDate'];
/* Convert it to an ISO date string */
  const ISOdate = formatter.stringFromDate(modDate);
/* Create a JavaScript Date object from the ISO date string */
  const JSDate = new Date(ISOdate.js);
/* Set the record's modification date */
  rec.modificationDate = JSDate;
1 Like