Another date not recognised

In the attached document, DT recognises 30.06.2020 as the newest document date. Is the German formatting of the date (which is one of the standards in Pages) in the top right hand corner of the document not recognised by DT? Is this a bug, or is it simply a case of “hey, we can’t do every format under the sun”?

Cheers :slight_smile:

Untitled 2.pdf (35.3 KB)

And there’s another thing; the same document contains the line “Rechnungsnummer:”

The following SmartRule lists the document as fulfilling the criteria of the rule:

If I run the rule on the document however, there is no result, ie the actions are not performed - if I change the rule to Content matches Rechnungsnummer (without the “:”) then the rule continues to list the document as fulfilling the criteria of the rule, but when run also performs the actions of the rule.

Couldn’t you use a regular expression to match any such date?
The pattern seems to be (not actual regex):

(10 or less characters), ([1-31]), point, (9 or less characters), (20 + [20-99])

and move that into custom meta-data?

Absolutely I could, I’m getting quite good at that now that I have 6 days of experience in scripting :stuck_out_tongue_winking_eye: (thanks again @BLUEFROG and @chrillek) - I was just wondering whether I need to, or whether this is a bug to be fixed for the next release :wink:

Very good. As you undoubtedly know, you learn fastest while trying.

I see. That does make you dependent on any bug fixing time path, but that’s your choice of course.

This crude example might do the trick until the year 2099 (which should be enough if you don’t cryopreserve yourself):

([a-z]*, [0-9]*. [a-z]* 20[0-9]*)

Or somewhat more refined for German dates and if accepted as ICU dialect:

([a-zA-Z]{6,10}, [0-9]{1,2}. [a-zA-Z]{3,9} 20\d\d)

1 Like

Absolutely. But you know how sometimes you should actually be doing something else…? I’ve just spent the whole day playing with scripts rather than doing what I should be :see_no_evil:

Thanks for you ideas - I’ll play with them :+1:t3:

1 Like

Not to put too fine a point on your question, but the date here is “1. Juli 2020”, which is 1.7.2020 in the equivalent numerals-only format. But it’s not the same.

And I guess that date formats with months literals are beyond the scope of automatic date recognition in DT3. Just think about “March 3” and" 3rd March".
I think, again, that you’ll have to go with scripting and regular expressions. The REs are just becoming more complex…

I don’t quite get the first part: 6 to 10 letters followed by a comma – why would that be the beginning of a German date?

The month part ([a-zA-Z]{3,9}) misses out on März (3rd month). I’d suggest something like
[ADFJMNOS](a-zä]{,8}\s+\d{2}|\d{4}
Uppercase letters for the possible months, followed by up to 8 lowercase characters for the rest of the month, followed by at least one horizontal whitespace followed by either 2 or 4 digits (years are sometimes abbreviated to two digits).

It’s alright, you’ve already sold me, I’m already sitting here scripting :stuck_out_tongue:

@anon6914418 was being specific in regard to my original question: his leading section is the day (from 6 ch Montag to 10 ch Donnerstag so e.g. Mittwoch, 1. Juli 2020); thanks for the tip re ä

Yup, my idea was to only match that combination to prevent matching other dates present like short dates or ISO dates. If you require other combination or a less specific match, by all means change it of course.

If you try and solve it on your own first (as you’re doing), you will likely improve your understanding of regular expressions. Copy and pasting other people’s work keeps you more dependent on them, and thus only ask for help when you get stuck.

Ok, but then I suggest to use
[DMFS][a-z]{5,9},\s+
for the weekday. The problem with REs is that they quite often pick up more than intended, so it’s better to be specific. \s+ requires at least one space but permits for more.

yeah, my script just went off and made a cup of coffee, nicked an apple and is going out to play ball :smiley:

Thanks folks, all coming along nicely - just playing so that one script deals with multiple document types; lot’s of fun, and I’ll get back to you all when I get re-eeeally stuck :slight_smile:

1 Like

Have fun: http://userguide.icu-project.org/strings/regexp

1 Like

Oh no: another one leaving me behind because I’ve not yet started! :grinning: Good luck!

Stephen

2 Likes

In case of multiple found date formats the ones based on digits (in this case 30.06.2020) are preferred currently but we’ll look into this.