Auto-renaming PDF after OCR based on content

bosie · December 11, 2020, 10:25am

Hi,

I have been searching for a solution to auto-rename PDFs after OCR based on the content. I don’t want to use rules (some documents might be one-offs). The PDFs are scanned documents like recipes, invoices, confirmations, doctor notes etc.
Does anyone know of a software that does this (either OCR+renaming or just the renaming) and has a workflow around DT to get this done?

thank you

cgrunenberg · December 11, 2020, 10:31am

How should the documents actually be named?

bosie · December 11, 2020, 10:42am

with the important entities and the date.
e.g. “DATE-doctor-issue” or “INVOICEDATE-company-product” or “DATE-company-nameOfInvestmentFund-typeOfMessage”

cgrunenberg · December 11, 2020, 10:57am

Theoretically this should be doable using few smart rules:

bosie · December 11, 2020, 11:12am

Trying to get the first one working. But I think I am doing something wrong

After applying the rule:

cgrunenberg · December 11, 2020, 11:53am

I’ve never seen this before, even in case of documents not containing a date. Did you insert the placeholder via the contextual menu?

bosie · December 11, 2020, 12:12pm

i typed it in and wrapped it in %. i cannot find this in the contextual menu BTW

cgrunenberg · December 11, 2020, 12:13pm

In the contextual menu of the e.g. Change Name field there should be an Insert Placeholder submenu.

bosie · December 11, 2020, 12:13pm

there is a submenu but “Sortable Document Date” isn’t among the options

edit: it seems to be the option ‘document date -> 2020-11-12’

BLUEFROG · December 11, 2020, 12:16pm

Choose the date type, like Oldest Document Date, then choose the date like 2020-12-11.

bosie · December 11, 2020, 12:16pm

i got it to work but this is basically the same problem as i have with hazel. every new invoice requires a change in the rules, defeating the purpose of having the rules in the first place.

cgrunenberg · December 11, 2020, 12:18pm

My example for…

…is able to handle multiple products & companies, see screenshot. But of course it’s hard to tell without knowing the actually desired names and the actual documents and how often you have to process new types of documents.

bosie · December 11, 2020, 12:19pm

but you are hardcoding both the product and companies?

BLUEFROG · December 11, 2020, 12:19pm

Criss’ smart rule covers three distinct variations, i.e., three different kinds of documents. You only entered one criteria while he used an OR method via the pipe (|) to specify multiples.

cgrunenberg · December 11, 2020, 12:20pm

Yes. A new company/product would require an update of the rule.

bosie · December 11, 2020, 12:20pm

well, no. his smart rule doesn’t cover anything as hardcoding entities is the opposite of what you should do. say i buy something from seller ABC, now i need to add ABC to the hardcoded entity list

bosie · December 11, 2020, 12:21pm

here is my hazel rule for renaming ‘vanguard statements’

Same principle as Criss’ rule but thats exactly what i would like to avoid

BLUEFROG · December 11, 2020, 12:21pm

I didn’t say it covered everything. I said it covers three types. And yes, of course you’d need to modify the rule to support new variants.

bosie · December 11, 2020, 12:22pm

sorry, what 3 types btw?

BLUEFROG · December 11, 2020, 12:24pm

Documents from Apple, Microsoft, or IBM.