I was wondering if document date gets the first found date from the first 4 pages of a file or if there is any other logic behind it.
I haven’t found answers to this in the manual.
Also, I’ve been using newest or oldest document date in smart rules, which works nicely, but in some documents, like in health lab reports, there are 3 dates, one of which is the birth date (oldest), one is the date of the blood draw (the one I want), and the newest is the date of egress. I want to get the date of the blood draw as the document date, which would usually work with oldest, but the birth date is obviously in the way for that.
Is it possible to exclude birth dates from oldest document dates somehow?
Additionally, I noticed that hours and seconds never seem to work in smart rules with document date placeholders. I tried the plain, newest and oldest and it usually results in 00:00 as the result (in the filename). The hours and seconds format in the documents tested is hh:mm, e.g. 14:05.
I’m ok with using creation date for hours and seconds, just thought I’d mention it.
Neither the file nor DEVONthink has any idea with a birth date is, so this isn’t feasible as suggested. Doing such a thing would depend on the text in the file.
Additionally, I noticed that hours and seconds never seem to work in smart rules with document date placeholders. I tried the plain, newest and oldest and it usually results in 00:00 as the result (in the filename). The hours and seconds format in the documents tested is hh:mm, e.g. 14:05.
Thank you.
But since I have no clue about Apple Script, I’ll probably rename everything like that manually, unless someone points me in the right direction.
An alternative to scripting could be to look for a marker before or after the date that you want, maybe “blood draw” or something. You could than use that in a smart rule with the action “scan text”…
Is this a PDF?
If so, use Data > Convert > to Plain Text and examine the text layer to see if the date stands alone or if there is the descriptive text before or after that date.
@chrillek@BLUEFROG Thanks guys. Yes, always PDF files that are OCR’d via Devon’s Abby integration.
There are two options. One option has text in front of the date (plus some more behind it).
Extraction at:14.12.20 15:35
The other option is in a new line without text before or after.
Date of collection
14.12.20 15:25
Currently reading page 266 of the manual, but it’s not clear to me yet what to do with this.
I guess I’d have to use string instead of date and then somehow convert this format to YYYY-MM-DD.
If you need date conversions with this kind of text scraping, you’d need to script this for sure. You can’t pass data between an action and an Execute Script action.
Obviously your “extraction” date consists of a date and a time. Whereas all the other dates in your file (presumably) have no time attached (birthdate? I don’t think so). So it would be possible to search for a regular expression like this one
(\d\d\.\d\d\.\d\d\s+\d\d:\d\d)
assuming that months, days, hours and minutes are always (!) given as two digit values. So one could extract this information from the text, but as @BLUEFROG said, one can’t use this value as is to set the creation date (or any date, it seems). [I seem to be hearing someone crying for more flexibility here: set the date to self-defined value, pass the result of a “scan text/name” operation to an external script … cf. Hazel.]
What you could do, though, is use a custom meta field (a string, probably) and store this date there. This should be possible in a smart rule.
What I could do is write a stand-alone JavaScript script to extract the afore-mentioned date and set the selected records’ creation date to this value. Unfortunately, this does not work with smart rules due to some problems with DT’s JavaScript support in this context. [which, I think, should be fixed]
I’m sure @pete31 would be able to write an AppleScript script that could do the same in a smart rule.
While the birth date doesn’t have a time attached, there is one other date with time attached in those documents, the date of completion or egress (from the lab finishing the analysis), which can be a few days after the blood draw / extraction and would correspond to newest date.
I mostly use the date in the file name with a syntax like 2021-03-28_16-56. I wouldn’t need to change the creation date.
Also, thank you for offering more solutions to this, but since these documents aren’t a very frequent occurrence, I’m ok with doing this manually for now and don’t want to impose work on you guys for something that might also be vulnerable to break easily.
@cgrunenberg: I guess this thread could be seen as a feature request, to maybe add the ability to exlude certain dates from smart rule results, or to somehow extend the newest and oldest document date placeholder functionality with a middle or date position 2 of 3 option.
Even if I need it formatted as YYYY-MM-DD_HH-MM, while the document only has DD.MM.YY?
The document date placeholders offer many syntax choices, but I don’t understand the scan text > date > … feature yet.
I see. So this is actually what I wanted then and it seems to work in an intital test. Somehow this was easier than expected.
@BLUEFROG: You mentioned that this kind of text conversion needs scripting, but it seems that even if the document has the date in another format, like DD.MM.YY, the following document date placeholder action can be set to change that. That’s pretty cool.
Yes, this wasn’t clear to me at first, even though the manual mentions it:
Date: Similar to String parameter, use the desired format of the Document Date placeholder to represent the captured string in subsequent actions.
So it seems that conversions of dates are possible within the scope of available formats from the document date placeholder, which converts the data from the scan text > date action.