Document Date questions

chrk · March 27, 2021, 2:14pm

I was wondering if document date gets the first found date from the first 4 pages of a file or if there is any other logic behind it.
I haven’t found answers to this in the manual.

Also, I’ve been using newest or oldest document date in smart rules, which works nicely, but in some documents, like in health lab reports, there are 3 dates, one of which is the birth date (oldest), one is the date of the blood draw (the one I want), and the newest is the date of egress. I want to get the date of the blood draw as the document date, which would usually work with oldest, but the birth date is obviously in the way for that.
Is it possible to exclude birth dates from oldest document dates somehow?

Additionally, I noticed that hours and seconds never seem to work in smart rules with document date placeholders. I tried the plain, newest and oldest and it usually results in 00:00 as the result (in the filename). The hours and seconds format in the documents tested is hh:mm, e.g. 14:05.
I’m ok with using creation date for hours and seconds, just thought I’d mention it.

Thank you.

pete31 · March 27, 2021, 2:32pm

From the AppleScript dictionary, document date :

First date extracted from text of document, e.g. a scan.

BLUEFROG · March 27, 2021, 3:21pm

Neither the file nor DEVONthink has any idea with a birth date is, so this isn’t feasible as suggested. Doing such a thing would depend on the text in the file.

Additionally, I noticed that hours and seconds never seem to work in smart rules with document date placeholders. I tried the plain, newest and oldest and it usually results in 00:00 as the result (in the filename). The hours and seconds format in the documents tested is hh:mm, e.g. 14:05.

@cgrunenberg : Confirmed here via AppleScript…

chrillek · March 27, 2021, 3:24pm

You’d have to script that, I guess.

chrk · March 27, 2021, 3:29pm

Thank you.
But since I have no clue about Apple Script, I’ll probably rename everything like that manually, unless someone points me in the right direction.

chrillek · March 27, 2021, 3:42pm

An alternative to scripting could be to look for a marker before or after the date that you want, maybe “blood draw” or something. You could than use that in a smart rule with the action “scan text”…

BLUEFROG · March 27, 2021, 3:49pm

Is this a PDF?
If so, use Data > Convert > to Plain Text and examine the text layer to see if the date stands alone or if there is the descriptive text before or after that date.

chrk · March 27, 2021, 4:28pm

@chrillek @BLUEFROG Thanks guys. Yes, always PDF files that are OCR’d via Devon’s Abby integration.

There are two options. One option has text in front of the date (plus some more behind it).

Extraction at:14.12.20 15:35

The other option is in a new line without text before or after.

Date of collection
14.12.20 15:25

Currently reading page 266 of the manual, but it’s not clear to me yet what to do with this.
I guess I’d have to use string instead of date and then somehow convert this format to YYYY-MM-DD.

BLUEFROG · March 27, 2021, 4:34pm

You’re welcome.

If you need date conversions with this kind of text scraping, you’d need to script this for sure. You can’t pass data between an action and an Execute Script action.

cgrunenberg · March 29, 2021, 8:13am

Only dates are currently supported.

BLUEFROG · March 29, 2021, 1:20pm

Noted. Thanks

chrillek · March 29, 2021, 1:41pm

Obviously your “extraction” date consists of a date and a time. Whereas all the other dates in your file (presumably) have no time attached (birthdate? I don’t think so). So it would be possible to search for a regular expression like this one

(\d\d\.\d\d\.\d\d\s+\d\d:\d\d)

assuming that months, days, hours and minutes are always (!) given as two digit values. So one could extract this information from the text, but as @BLUEFROG said, one can’t use this value as is to set the creation date (or any date, it seems). [I seem to be hearing someone crying for more flexibility here: set the date to self-defined value, pass the result of a “scan text/name” operation to an external script … cf. Hazel.]
What you could do, though, is use a custom meta field (a string, probably) and store this date there. This should be possible in a smart rule.
What I could do is write a stand-alone JavaScript script to extract the afore-mentioned date and set the selected records’ creation date to this value. Unfortunately, this does not work with smart rules due to some problems with DT’s JavaScript support in this context. [which, I think, should be fixed]
I’m sure @pete31 would be able to write an AppleScript script that could do the same in a smart rule.

chrk · March 29, 2021, 2:07pm

Thank you for the help @chrillek.

While the birth date doesn’t have a time attached, there is one other date with time attached in those documents, the date of completion or egress (from the lab finishing the analysis), which can be a few days after the blood draw / extraction and would correspond to newest date.

I mostly use the date in the file name with a syntax like 2021-03-28_16-56. I wouldn’t need to change the creation date.

Also, thank you for offering more solutions to this, but since these documents aren’t a very frequent occurrence, I’m ok with doing this manually for now and don’t want to impose work on you guys for something that might also be vulnerable to break easily.

@cgrunenberg: I guess this thread could be seen as a feature request, to maybe add the ability to exlude certain dates from smart rule results, or to somehow extend the newest and oldest document date placeholder functionality with a middle or date position 2 of 3 option.

cgrunenberg · March 29, 2021, 2:17pm

The dates can be already specified via the Scan Date action by specifying a prefix and/or suffix.

chrk · March 29, 2021, 2:25pm

Even if I need it formatted as YYYY-MM-DD_HH-MM, while the document only has DD.MM.YY?
The document date placeholders offer many syntax choices, but I don’t understand the scan text > date > … feature yet.

cgrunenberg · March 29, 2021, 2:28pm

No, times are not supported. The Scan Text > Date action only specifies where to search for the date in the document.

chrk · March 29, 2021, 2:51pm

I see. So this is actually what I wanted then and it seems to work in an intital test. Somehow this was easier than expected.

@BLUEFROG: You mentioned that this kind of text conversion needs scripting, but it seems that even if the document has the date in another format, like DD.MM.YY, the following document date placeholder action can be set to change that. That’s pretty cool.

This is my smart rule now:

Everyone, thanks again for the help!

chrillek · March 29, 2021, 3:32pm

Am I correct to assume that “Sortable Document Date” here is the date matched by the “Scan Text” action just before?

chrk · March 29, 2021, 3:38pm

Yes, this wasn’t clear to me at first, even though the manual mentions it:

Date: Similar to String parameter, use the desired format of the Document Date placeholder to represent the captured string in subsequent actions.

So it seems that conversions of dates are possible within the scope of available formats from the document date placeholder, which converts the data from the scan text > date action.

chrk · June 2, 2021, 11:19am

I was wondering if you could help me out with a scan text > date formatting question.

Some of my files misinterpret a vertical slash (|) as 1 after OCR. So the text layer of those files ends up as:

Date
102.06.21

… instead of:

Date
02.06.21

Is there a way to get this date with scan text > date?

I tried 1*, but there are other lines starting with 1, so this didn’t work.

Date*1* also didn’t work.

I also tried scan text > regular expression as 1(\d\d\.\d\d\.\d\d\), but I guess I’m doing this wrong.

Thank you