Extract date from PDF and set it as creation date

Hi there,

first of all I’m new to the forum and a script newbie.

I have many PDFs with a „wrong“ creation date (because I often import URLs (most of them News) via “mail to” Bookmarklet but the Mac or Mail.app are not always running at that time - and of course not all URLs were created the day I send them via Mail).

So I want to set the date found in a PDF to the creation date. (I tried myself on an AppleScript to do the trick and it works - but I ended up realizing that it would be too much work to match the delimiters for many different sources.)

Searching the forum I found a very nice JavaScript to extract the first date found in a PDF and set it to the creation date (found here: Find out the Date in an OCR Scanned PDF and Rename to Date).

This works fine as long as the month is numeric, but it won’t work if its found as name (e.g. „Januar“).

Can someone please help me with the script?

And is there any chance to also add the time (if found in the PDF) to the creation date? (This would be useful to see which news were published first (like a log) even if they came up on the same day.)


var months = ['Jan', 'Feb', 'M[äa]r', 'Apr', 'Ma[iy]', 'Jun', 'Jul', 'Aug', 'Sep', 'O[ck]t', 'Nov', 'De[cz]'];

var monthsRE = months.map(function (x) { 
   return new RegExp(x); });
   

var monthString = "(?:(0?[[1-9]|1[012])[-./ ]+|(" 
 + months.join('|')  // All month names as alternatives
 + ")[a-z]*\\s+)";    // followed by possibly more characters (long month name) and at least one space

var dayString = "(0?[1-9]|[12]\\d|3[01])[-./ ]+";

var yearString = "((?:[12]\\d)?(?:\\d{2}))";

var REString = dayString + monthString + yearString;
var dateRE = new RegExp(REString);



var Devon  = Application("com.devon-technologies.thinkpro2");

Devon.includeStandardAdditions = true
 
var pr = Devon.properties();
var selection = pr['selection'];

if (!selection || selection.length === 0) {
  Devon.displayAlert("Erst Datensätze auswählen");
} else {
  for (var i = 0; i < selection.length; i++) {
    var record = selection[i];
   if (record.type() === 'PDF document') {
       var t = record.plainText();
       var found = t.match(dateRE);
      if (found) {
          console.log(found);
        Devon.displayAlert(found[0]);
          var tag = +found[1];
        var monat = found[2];
        if (+monat === 0) { // month as string
          monat = found[3];
          monthsRE.every(function (m, i) {
           if (m.match(monat)) {
             monat = i + 1;
            return false;
           } else {
             return true;
           }   
          });
        }
        var j = +found[4];
        var jahr = j < 100 ? +j + 2000 : j;
        

        var result = Devon.displayDialog(record.name() + 
          '\nDatum ändern zu:' ,{
             defaultAnswer: tag + '/' + monat + '/' + jahr,
              buttons: ["Abbrechen", "Ändern"],
                          defaultButton: "Ändern"
                          });
         if (result.buttonReturned === 'Ändern') {
          record.date = new Date(jahr, monat-1, tag);
        }
      } else {
        Devon.displayAlert('Kein Datum gefunden');
      }
     }
}
}

No ideas?

You want to change some date (which date?) found inside a PDF to the PDF file’s creation date? That’s tricky and may be beyond the limited ability of AppleScript to modify the text in a PDF document.

The first date found is in my case almost always the right one. The Script I posted does what I’m looking for, but only if the date is like 18.2.2017. If it’s 18. Februar 2017 the Script won’t recognize the month, in the dialog the default answer then is 18/undefined/2017. I don’t understand this - months as names appear right in the first line of the Script…

The Script shows up the right date (= first date in PDF) in the first dialog, no matter if the month is given as name or not. The Problem is that the next dialog (the one that asks for permission to change the creation date) does not display the right date. Dialog 1 does, Dialog 2 says “/undefined/”. The Script can do it but there’s a flaw :confused:

WARNING: if the second dialog says “/undefined/” and you click change you will loose the creation date of the file!