How to extract accurate document dates from statements

I’ve been unsuccessful figuring out how to extract accurate document dates from some financial statements. For example, here is the statement date from one:

Statement Period
July 1-31, 2024

And another:

Statement Period: Jun 12 2024-Jul 11 2024

I’d like to use the extracted document dates to rename and file using smart rules. One consistent way to do that would be to use the statement ending period as the document date, i.e. 2024-07 in my first example above and again 2024-07 in my second one.

I’ve tried DT’s built-in extracted dates (e.g., document date, newest date) with no luck. These kinds of statements often contain many dates of course, so that’s not surprising. I’m aware of the Scan Text for date facility but have not yet found a way to make that work with these kinds of date ranges. Ideas?

No surprise there – your first example doesn’t match a valid date, and the second matches two dates.

Given the inconsistent format, eg Jun/Jul vs July and one month vs two, that’s probably not easy.
One or more regular expressions in a script might be possible. For example

/(Jul.*?|Jun.*?)(\d+\s+\d{4}\-(Jul.*?|Jun.*?)\s+\d+\s+\d{4}|\d+\s*-\d+,?\s*\d{4})/gm

matches both the July in your first and Jun in your second example and saves them in capturing group 1. You’d have to add all the other months as well, though. That’s when it becomes even uglier.

Not to mention that you don’t want Junin the second case, but Jul.

I’d script that. Or perhaps you can extract the month and year from the file name or some other part of the document. For example, some telecom bills mention the month they’re referring to in their text.

I normally use Hazel for those sorts of things, but from other posts here (you can find them via searching for them) DEVONthink can do it. The trick is not assuming what the OCR “text layer” actually is vs. what you think it should be. Start with reading the “How to deal with PDF Searchability” available at the link below, and also under the “…” icon at the top of the DEVONthink Sidebar.

1 Like

Thanks, chrillek. Not sure why I didn’t think of it, but your suggestion to extract the needed date from the file names has worked very well so far.