Format of date extracted from document

In Beta1, I created a smart rule to file a specific bill that I have to pay monthly. To set the date for the document, I used the rule: Change Date → Document Date.

Since I receive the bill from an entity in South Africa, the date format is dd/mm/yyyy.

  • In Beta1, DEVONthink correctly interpreted 03/05/2019 as 5 May 2019.
  • In Beta2, DEVONthink assumes the American format, and 04/06/2019 becomes 4 April. :frowning:

Is there any way in which I can control the format of the date being scraped from the document?

Beta3?

Sorry, my brain is elsewhere. I mean beta 1 and 2. I’ll amend my post…

Do all of your documents use this format or also the American format in some/most cases?

Hi @cgrunenberg. I made a trivial Smart Rule just to test this (see below). Almost all documents I have originate either in Germany or South Africa.

These dates work just fine:

  • German standard: dd.mm.yyyy
  • South African standard: yyyy-mm-dd

These do not work:

  • Old South African standard: dd/mm/yyyy

The latter gets interpreted as a US date, i.e. mm/dd/yyyy. Of course there’s no way to automatically distinguish dd/mm/yyyy from mm/dd/yyyy unless dd > 12. :frowning:

04

That’s unfortunately true. And neither do the current system settings help in case of international documents.

One possibility might be to use a small script to swap the values afterwards. A similar script could be also used in smart rules but only if the conditions of the smart rule ensure that only documents using the old South-African standard are used.

Allow me to return to this old topic…

I still feel it would be a useful addition if DT3 included some way to allow one to specify the format of a date to be scanned for. In themean time, I’m working around this limitation by using the Scan Text and Scan Name actions (sans any scripting!)

I’ll be the first to admit this is something of a hack:

  • First, I scan the dd/mm/yyyy date from the document using Scan Text with a regular expression.
  • I then embed a yyyy-mm-dd string in the document’s name using backrefs.
  • Finally, I scan the document’s name for a date using Scan Name.

It … works?!

Attached, a screenshot of a simplified version of the Smart Rule.

1 Like

That begs the question: how?

  • If you want to specify that on a per document basis, you’re not better off than right now with your smart rule “hack”
  • If, on the other side, you want to be able to specify that generally, you’re not better off than right now with a system wide setting.

If (big if) DT where to know about the originating locale of the document, it could make a more educated guess. Something like “hey, I can see that this document is coming from South Africa, so we’ll use the “en-sa” locale, where date is specified as …” Analogously for french, german, China (ok, China is in fact simple because of their unambiguous way to put a date) etc. That just might be possible for invoices by (for example) scanning the address of the sender.

But it’s more or less what you do already with your smart rule by scanning for “Account summary …”

One idea — off the top of my head and not properly thought-through — would be to use a format string (perhaps such as that used by the UNIX date(1)) instead of a simple * when extracting a date using Scan Text or Scan Name.

Hmm. Files in DT3 do have Country metadata, so one could argue that — once the originating country has been set — one could resort to the standardised date format for that country. But there are all sorts of stumbling blocks… I don’t think there’s any country where the official date format is used exclusively in all cases!

I’m not really complaining. Date and time formatting is a hard problem, and my little hack/workaround solves the particular issue I’d been having.

(Another generic idea would be to allow the user to store values in variables when defining Smart Rules. Effectively I’m using the filename as a temporary value store in my workaround.)

:+1: Looking at Hazel …