DT doesn't recognise all dates

The attached PDF contains three dates. I would expect the following rule to change the document name to 15.12.1940 - but it doesn’t - instead, the oldest date is recognised as 07.03.1976. Why is that?

07.03.76.pdf (23.7 KB)

The date “15.12.1940” - together with the leading asterisk is copied from an OCRd document (which I cannot provide) - it isn’t recognised there, either.

Removing the asterisk makes no difference, btw.

DT 3.5.1 on 10.15.5 (19F101) with system language “English (UK)” and Region “Germany”

Here are my returns in AppleScript…

tell application id "DNtp"
	set sel to item 1 of (selection as list)
	document date of sel
	--> date "Sunday, March 7, 1976 at 12:00:00 AM"
	oldest document date of sel
	--> date "Sunday, March 7, 1976 at 12:00:00 AM"
	newest document date of sel
	--> date "Wednesday, July 1, 2020 at 12:00:00 AM"
end tell

@cgrunenberg will have to look into this.

I look forward to a detailed technical analysis. Or just a fix, if the analysis is likely to make my :exploding_head:

Thanks you two :slight_smile:

Only years from 1950-2050 are currently accepted to make the fuzzy recognition more accurate,

Since only dates after 1950 are automatically recognized (and I’m not so sure that even they are if they contain spaces as in your case): I think you might be better of with scripting (or rather: that’s your only chance in this case). Personally, I’d go for a JavaScript script because it provides for Regular Expressions which makes things a lot easier.

Something like this (NOT TESTED!):

app = Application('DEVONthink 3');
app.includeStandardAdditions = true;

var sel = app.selection();
var date = /(\d{,2}\s*\.d{,2}\s*\.\d{4})/g;
sel.forEach(el => {
  var txt = el.plainText();
  var datesFound = [];
  var matches = txt.match(date);
  matches.forEach(currentDate => {
    var normalizedDate = currentDate.replace(/\s+/g,"").split(".").reverse().join("-");
    /*"30. 12. 1940" => "1940-12-30" */
}) /* matches.forEach */
  /* find latest date */
  sorted = datesFound.sort(function(a,b) { return a < b});
  lastDate = sorted[0].split("-").reverse().join(".");
  /* "1940-12-30" => "30.12.1940"*/
  sel.name = "whatever" + lastDate;
}) /* sel.forEach */

I’m sure that it’s feasible in AppleScript, too. But I’m too lazy to try that :wink:

1 Like

That only makes sense for those not dealing with people and their dates of birth :wink: although I suspect changing the behaviour would just inconvenience others. At least in Germany it is typical to mark dates of birth with an *, I wonder whether that would help…

I could easily extract the date with a script, but as detailed in this post (Can Scan Text Date extract a /21.04.20 and how do I make that the created date?) scripts dealing with dates seem not to work when triggered automatically…

The next release will extend the range to 1900-2100, this should be sufficient in almost all cases :slight_smile:

1 Like

I don’t think that that’s the case. The script you’re mentioning is quite a handful of piped Unix commands, notabene sed. You might want to try an approach like the one I outlined above where all the date extraction and modification are handled in an (internal or external) JavaScript. If you know that you only want to get the birth dates and if those are always preceded by an asterisk (not by e.g. “geb.” or “geboren”), than you can modify the Regular Expression (var date = ) above like so

var date = /\*\s*(\d{,2}\s*\.d{,2}\s*\.\d{4})/;

and get rid of the matches.forEach part (provided there is only one such date in the file). lastDate = match[0]; should than give you what you need.

1 Like

First off, thank you very much for taking the time and making the effort to help me :slight_smile: I’m clueless as far as scripting is concerned, but when I get up tomorrow, I’m going to dissect your little script and see what it does, and how and why :slight_smile:

I haven’t come to a final conclusion re the script in my other post; it works perfectly as soon as I run the appropriate rule manually - but nothing happens when that exact same rule is triggered automatically; there’s something weird about that, and I’m still playing about trying to find out which component it is exactly. And whilst this post is specifically about dates of birth, the other post is about document dates.