Can Scan Text Date extract a /21.04.20 and how do I make that the created date?

Blanc · June 27, 2020, 1:40pm

I would like to automatically date and name a typ of document which I regularly receive. I have a rule which reliably picks out the document, but which also reliably fails to pick out the document date. This is because all copies of the document contain the date 01.10.2019 near the top right, and the actual document date is in the footer, and of the format “BLATT: 2858/21.04.20”. The 4 digits before the / are always variable, and the number of spaces after the : is several, but will usually on copy paste as one space. I would like to change the creation date to - in this case - 21.04.20 (DD.MM.YY). DT does not recognise “BLATT: 2664/21.04.20” as containing a date at all (so using "newest document date doesn’t work), and macOS assumes the line to contain a phone number.

I thought maybe I could extract the date with Scan Text Date in the rule, but I can’t figure out how. Also, even if I could, it’s not obvious to me how I would then set that date as the created date. Can anybody point me in the right direction? Or is there another (or no) solution? Cheers

BLUEFROG · June 27, 2020, 2:13pm

Like so…?

Blanc · June 27, 2020, 2:32pm

ye-es - now paste me the smart rule :* Pleeeeease

Your post shows that you have successfully extracted the date - does that date automatically become the document date?

BLUEFROG · June 27, 2020, 4:04pm

does that date automatically become the document date?

No and there’s no Document Date to set as it’s a read-only property detected by DEVONthink.
You could set a custom metadata field (as I did in the version I attached) or change the name or some such other thing.

Also, note I captured the parts of the date individually so you can reorder as needed.

Blanc Date from Text.dtSmartRule.zip (1.0 KB)

Blanc · June 27, 2020, 4:13pm

Thanks very much for that With that in hand I can go off and better understand the syntax. It’s certainly useful, because now I can use part of the date in the name of the document, which is brilliant.

So am I right in saying that there is no way that once that date is extracted it can become the “created” date of the document? Generally when I scan documents I change the “created” date to the date the document was actually created by the author, and was hoping to do that automatically in this case.

Is it a bug that DT does not recognise the line as containing a date (stopping me from using e.g. change created date to newest document date)?

BLUEFROG · June 27, 2020, 4:23pm

You can change the creation date if you want - that’s an available smart rule action. I elect to not change creation dates as it’s a bad practice in my opinion. A file’s creation date is no different than it’s birth date. It is a proper record of when it was created. That’s why I opted for custom metadata, which can be used for arbitrary purposes.

No, it’s not a bug. The date isn’t in a form that is recognized as a valid date. @cgrunenberg would have to assess supporting this variation.

Blanc · June 27, 2020, 4:29pm

I’m aware that a smart rule can change the creation date (and that tastes vary - but I can’t figure out a smart rule which would use the result of the Scan Text which you provided. I’m trying to get this document to automatically be dated (created) 21.04.2020. Obviously when DT recognises the document date that is easy, but how do I do it in this specific case?

(and yes, using custom metadata might well have been a better option - perhaps to be considered; in your example is “DocDate” a custom date field, or a custom text field?)

Thanks again for your help Jim - honestly much appreciated

BLUEFROG · June 27, 2020, 4:35pm

You’re welcome

The DocDate is just Single-Line Text. An attribute using the Date data type wouldn’t work as it’s confined to a set of known date options…

So you are for sure trying to change the creation date with the value scraped from that line, correct?
And what kind of document are you trying to process?

Blanc · June 27, 2020, 4:52pm

Exactly that

It’s a single page document which annoyingly contains “01.10.2019” at the top right and the date the document was produced in the footer.

BLUEFROG · June 27, 2020, 5:27pm

This actually requires scripting.

Due to an issue @cgrunenberg would have to respond to, this also requires an external script. I have attached the script which should be put into ~/Library/Application Scripts/com.devon-technologies.think3/Smart Rules.

The smart rule should be adjusted like so…

Creation Date from Text_SR.scpt.zip (2.3 KB)

For @cgrunenberg:

This handler can’t be used in a smart rule embedded script…

on convertDate(scrapedDate)
	return date scrapedDate
end convertDate

It does work in an external script, hence my approach above.

Blanc · June 27, 2020, 6:30pm

That, dear Jim, is a stunning level of support - it’s something I should be able to do myself and endeavour to learn as time goes by (that’s vague, but honest…). Thank you - personally, and “you” DEVONtech (I wonder whether it is telling that there is no smiley defined as “:humble:”)

BLUEFROG · June 28, 2020, 12:39am

You’re very welcome and thanks for the generous compliments

And perhaps it is telling there is no humble emoji

Blanc · June 28, 2020, 10:15am

I’m going to have to ask for more help though, please; in your script I had to change the date order (from \\2.\\1.\\3 to \\1.\\2.\\3), and from that point on it worked. I thought.

It works reliably if I trigger the rule manually - if the rule triggers automatically “On Import” following OCR (Convert Incoming Scans - to Searchable PDF), however, then I get an error from the script (12:09:39: ~/Library/Application Scripts/com.devon-technologies.think3/Smart Rules/Creation Date from Text_SR.scpt on performSmartRule (Invalid date and time date of ÇscriptÈ.). If I trigger the exact same rule manually immediately after receiving the error, then it does what it should.

BLUEFROG · June 28, 2020, 1:31pm

What is your system language set to?

if the rule triggers automatically “On Import” following OCR (Convert Incoming Scans - to Searchable PDF)

What are your smart rule settings?

Blanc · June 28, 2020, 1:48pm

Effectively this (there are several Content conditions, and no Type condition); the rule correctly triggers as verified by a Rename action after the Script action, which is performed (but I have tried the rule with only the Script action, same result)

(I’m on the road, so can’t post an image)

The system language is English (UK), with German Date/Time/Punctuation Settings

Blanc · June 28, 2020, 4:34pm

cgrunenberg · June 29, 2020, 11:42am

That’s actually hard to tell without a copy of the document. In case of PDFs it might also be an issue of the PDFkit’s text conversion.