Regex to extract dates from pdfs

In my own example, I am working on, I want to file receipts for tax purposes. I scanned a large number of them with Readdles Scanner Pro on the iPhone. This app even allows for a workflow to rename a file to something like “receipt-YYYY-MM-DD” and upload it to a Dropbox folder, where I collect them (a folder indexed by DT). Unless I scan the receipts immediately, the date of scanning will not the correct one, hence the desire for a smart rule in DT to help.

In my collection for this year, I only found four variations, which may be an advantage of receipts that there will most likely be only one date in the file. The variations I am trying to cover are:

  • Variation 1: DD.MM.202Y (date and month with or wihout leading ‘0’)

    • Variation 1.1: D.MM.202Y
    • Variation 1.2: D.M.202Y
    • Variation 1.3: D.M.2Y
  • Variation 2: DD.M.202Y (without leading 0 for date and month)

    • Variation 2.1: DD.M.2Y
  • Variation 3: 202Y-MM-DD (date and month with or without leading ‘0’)

  • Variation 4: DD/MM/202Y (date and month with or wihout leading ‘0’)

    • Variation 4.1: D.M.202Y

The sub-cases did not actually occur but I can imagine it may happen.

I assume that restricting the year to 202Y should also make the search quite specific. (I should not be later with my tax than two years back :slight_smile:

Wish me luck and in case someone has an idea for my case, let me know!

Olaf

1 Like