Although I do not quite understand what a “correct date” means, here’s what I suggest as a regex:
\d+[-.]\d+[-.]2\d+
for the first variation and its subclasses.
Oh, well, that catches the second variation, too. And the fourth one.
˚202\d-\d+-\d+` for the third one.
However, I’d advise against restricting the years at all. This kind of RE will stop working in eight years time, and then you might be wondering why. If I understand it correctly, you are dealing with two basic date formats: German (day separator month separator year) and ISO (year separator month separator year). And fortunately no named months.
The first one might be captured like this:
(\d\d?)[-.\/](\d\d?)[-.\/](\d{2,4})
Dissected:
-
\d\d?
is a single digit with an optionally following one. Captures everything from 0 to 99, including leading zeros
-
[-.\/]
captures one of the characters ‘-’, ‘.’ or ‘/’. Note that the latter one might have to be escaped since it is used to limit REs. Not doing that might result in problems, escaping does no harm. Also note that the -
has to come first in a character class! Otherwise, it is indicating a range (as in [1-9]
in the example below).
-
\d{2,4}
captures between two and four digits, i.e. the year in the complete and abbreviated form.
This works for all years, but it has (of course) some shortcomings, since it would also capture something like “45-07-1923” which might be a telefon number or a code used for archiving or what not. In order to prevent that, you could refine the expression, and that’s where the fun starts:
\b(0?[1-9]|[12]\d|3[01])[-.\/](0?[1-9]|1[012])[-.\/](\d{2,4})
-
\b
means a word boundary, i.e. we do want a stand-alone date string. More on that below
-
0?[1-9]
catches all single-digit days with an optional leading zero
-
[12]\d
catches all days between the 10th and the 29th
-
3[01]
catches the last one or two days of the month for all months but February
-
|
is the alternation operator so that the three preceding sub-expressions are alternatives and the first matching one is selected
- Something similar is happening for the months, so no need to dissect that.
Now, this expression will not match “45-07-1923”, but only because of the leading \b
. Otherwise it would find a match starting at the “5”, and we do not want that. So the leading \b
ensures that, and we’d even be more secure with appending the same to the end of the RE (thus preventing partial matches for something like “12-34-5678AB”.
Also, this refined RE would happily match “30-02-2022”. One can prevent that, too, with some more work.
In any case, the day, month and year captured in groups ()
and accessible later als $1
, $2
, and $3
.
I recommend fiddling around with REs in https://regex101.com. It’s very easy to test REs against real-life strings there: As you can see below, matched and unmatched strings are clearly marked.