How to specify exact date-form with additional characters (e.g. for Batch Processing?)

Hey there,

I have documents from a hosting company that I want automatically add the date to the file name.
The problem is, they show the packages like this:

www.domain.com (01.01.2019 - 31.12.2019)

The document date higher above is e.g. 12.12.2019.
So the script is set to use newest document date and - of course - finds the 31.12.2019.

How can I EXCLUDE all dates beginning or ending with a parenthesis?
Like excluding (11?11?1111 and 11?11?1111) - then it should use the document date, right?

Any help highly appreciated :slight_smile:

When you say

they show the packages like this:
www.domain.com (01.01.2019 - 31.12.2019)

are you talking about the file’s name? If not, what do you mean by “package” - an archive like ZIP or tar?

Also, I’m not sure what

means. Do you want to say something like
“the documents contain lines with different dates, the most recent of them is 12.12.2019”?
If not, what do you mean? Could you show an excerpt of the document?

Where are the dates “in parentheses” coming from, what do they stand for?

If you want to extract a date from the document that is not the newest or oldest one, you’ll probably have to resort to scripting.

Hey @chrillek, thanks for the response and sorry for the confusion:

Yes I think a screenshot can clarify my issue:

The documents are in PDF-format, with package I mean the hosting-package - maybe in English hosting-bundle would be more accurate?
the desired date is in green (it is the document date I need for the file name)

You see many dates here, one stating the first order date, the others state the timeframe of the invoice. The problem is that the end of the timeframe is later/newer than the document date, but I need to fetch the document date

What’s the first date in the document after converting it to plain text (see Data > Convert)?

First date is 27.05.2015 (date of first order and latest date)
the second date is 29.02.2020 (the desired one)

As the desired date doesn’t have a useful prefix/suffix that could be used by the Scan Text smart rule action, the only option is unfortunately to script this (e.g. by using the all document dates property of records).

Hi there,

after tinkering around I found that it is possible to use REGEX in batch processing :man_facepalming: ,
I noticed that the easiest thing that distinguish all the dates is that my desired date sits in a separate line. So I did the following:

Tools -> Batch Processing with the following settings

Scan Text | Regular Expression | \n(\d{2}.\d{2}.\d{2,4})\n
Change Name | %recordName% —\1
(the — character I use for visual purposes only)

That searches for every date that begins and ends with a new line.
For those users who can’t read REGEX, this is what happens here:
\n = new line
\d = any number {2,4} = must be 2 OR 4 of it
\. = . (as the full stop (.) stands for any character in REGEX we need to escape it with \ )
( … ) is the group we can reference to with:
\1

(thanks to @chrillek to mention to better strengthen the REGEX with the strict search for strict number of characters with {…}) – see post below

grafik
Worked AWESOME!

P.S.: Just because I thought of it:
If you have a german date format (like I have) and want to convert it into the English one (e.g. 16.03.2020 to 2020-03-16) you have to rewrite the rule to:
\n(\d{2}).(\d{2}).(\d{2,4})\n (group the numbers only)
and the Name Change to %recordName% —\3-\2-\1

Of course you can also use this trick to convert any english date back to german format :slight_smile:

2 Likes

That’s cool. I’d have suggested to go with JavaScript and a RegEx, but your approach is (at least for me) something new.

And I’d suggest to be a bit more specific in the RE, like \d{4} for the year.

Wow! What a compliment :star_struck:
Means much to hear that from a guru like you!

And of course you’re very right when you say:

I’ve changed it in the above explanation so other users can profit from it, too :slight_smile:

I’m pretty new to REGEX to be honest, and sometimes I forget to clarify it more.
I’m used to take less effort to catch the most fish - like working with the fish net, but that’s definitely NOT the way one should work with REGEX, I know :sweat_smile:

If you ever want to read up on REs, go for Jeffrey Friedls book on them (O’Reilly). There’s also a German edition available (at least it was).

1 Like