DEVONthink 3.5 - Using Smart Rules to extract date field + rename + file pdf

alvin · May 30, 2020, 2:42pm

Hello DEVONthink users,

I’m relatively new to DEVONthink, and am struggling to set up automations to properly handle financial statements.

I’ve been using Hazel to handle this automation for the past 5+ years, and am stuck with reproducing the following workflow using Smart Rules in DEVONthink 3.5:

Automatically OCR all incoming PDFs (no issues; implemented in Global Inbox).
Extract a desired date field from the entire financial statement (e.g. extract 14 May 2020).

01 PDF Bill916×482 146 KB
Coerce extracted date field into “YYYYMM” format (i.e. 202005).
Ensure PDF contains specific text fields (e.g. name of financial institution, account name).
Rename file using “YYYYMM - Custom File Name.pdf”.
File processed PDF into the right location in the database.

I suppose I can “simplify” the above workflow by “Classifying Document”, but I still need the file name of the PDF to be automatically renamed to “YYYYMM - Custom File Name.pdf” upon ingestion.

I searched the forums, and most workarounds seem to be AppleScript heavy, or they aren’t on DEVONthink 3.5.

Thus, reaching out for some help from the community (note: I’m not a highly technical user).

Thank you,
Alvin

Appendix - Existing Smart Rule Under Development

Appendix - Sharing Existing Hazel Workflow

1. Overall Hazel Rule + Date Extraction

2. Date Coercion

BLUEFROG · May 30, 2020, 2:57pm

4. Ensure PDF contains specific text fields (e.g. name of financial institution, account name).
5. Rename file using “YYYYMM - Custom File Name.pdf”.

I don’t see where you’re using these steps, especially Custom File Name.

Here is an example that queries based on the Invoice number (since that’s the only exposed valued). Obviously, you could use “Account No.” and all or part of the actual number.

This returns 202005.

alvin · May 30, 2020, 3:12pm

Sorry Jim, I wasn’t being sufficiently precise.

This would be better explained in Hazel’s context where certain conditions need to be met before the entire automation ran (e.g. the right account number exists in the pdf).

Basically, I wanted to use the extracted date “YYYYMM” from PDF to include it within the new file name “Custom File Name” (e.g. Monthly Bill in example below).

chrillek · May 30, 2020, 3:22pm

I don’t quite understand your workflow. Do you want to rework in DT3 what you already have in Hazel? What’s the point?

I’m asking because I do something similar (with 3 banks, 2 databases and about 6 accounts):

Download bank statements
possibly OCR them using Hazel & PDFPen
extract account #, date and account statement number
rename file accordingly in Hazel
add tag(s) in Hazel
move the statement to the appropriate database and group in DT3 with a single script

Trying to do this in DT3 seemed more tedious because it is still (though that’s changing) less flexibel regarding the possibilities to rename the document.

What do you try to achieve in DT3 that you’re not already doing in Hazel?

alvin · May 30, 2020, 3:23pm

Thanks Jim, tried it out, but am stuck at the “placeholders”.

What is the syntax behind it (can’t find it in Help)?

Asking as the placeholders seem to have a % % surrounding the text.
Somehow, I’m getting this…

alvin · May 30, 2020, 3:31pm

The key context behind this challenge is to:

Fully eliminate my reliance on Hazel + PDFPen
Handle automatic OCR + file renaming entirely within DEVONthink itself

As such, I’m trying to find the most elegant way to (within DEVONthink):

OCR incoming PDFs (done)
Extract desired date within PDF document (stuck)
Ensure date aligns to YYYYMM syntax (stuck)
Rename file (stuck)
File in desired Group in DEVONthink’s database (I suppose “Classify” could be used here).

chrillek · May 30, 2020, 3:47pm

Your choice. I’d rather go with “never change a running system”

As to the placeholders: You have to insert the place holders by right clicking in the text field and then selecting them from the popup menu unter placeholder. See the screenshot. The %…% syntax is apparently not meant to insert placeholders manually into the field.

BLUEFROG · May 30, 2020, 5:03pm

As @chrillek noted, these placeholders are added via the Control-click > Insert Placeholder menu.

My example smart rule with a Change Name instead of Display Alert works on the example image you posted (noting the filename in the titlebar).

alvin · May 31, 2020, 6:07am

AH! THANKS FOR THIS!
I finally learnt how to add a placeholder (silly I know)…

I’m motivated to “change a running system” if it’s for the better?
Trying to reduce the dependencies on Hazel + PDFpen, and am highly drawn to document similarity scoring (i.e. classify) in DEVONthink.

Managing file paths manually are quite painful…

Thanks Jim! I finally got it working!

@chrillek, @BLUEFROG, I really appreciate your rapid assistance with this!

You both just made my weekend!

BLUEFROG · May 31, 2020, 6:17am

You’re welcome

alvin · May 31, 2020, 2:24pm

Hello folks,

Following up on this, do you have any good ideas on how to “trigger” Smart Rules?

Kindly reference my current Smart Rule (mock data used):

Screenshot 2020-05-31 at 22.12.48

Challenge:

My desktop notifications are triggering despite having no .pdfs to action upon in the Global Inbox!

What I’m trying to achieve:

After applying an OCR tag (via another Smart Rule) to all incoming .pdfs, I wanted the above smart rule to trigger so as to rename + file my .pdf in its “rightful place”.
I wanted a simple alert to let me know that the operation is complete (but it’s currently triggering non-stop!).

Any good ideas?

Thank you,
Alvin

alanwill · June 11, 2023, 12:15am

Hey @alvin did you ever figure this out? I’m in exactly your shoes of wanting to consolidate/simplify my toolchain and do as much in DT as possible and on of those use cases is doing something similar to what you’re doing, i.e extracting data from a file and using it as custom metadata, then also running automation on top.

BLUEFROG · June 11, 2023, 2:05pm

Welcome @alanwill

What have you tried?