Welcome @Carsten221
It’s impossible to say how many smart rules would be required.
Vendor to vendor, your data is surely not going to be very uniform, e.g., the content and positions of content on a page. This is compounded by documents like PDF are constructed differently than what you see on screen.
It also depends on the technology involved. Are you using built-in smart actions, like Scan Text or scripting, etc., If you’re going to use AI, are you comfortable with sending document text to a commercial provider online? If not, do you have the hardware and time to allow local models to try and parse things. And are you willing to iteratively fine-tune your prompts and later spot-check documents to ensure things are working?
Those few things being said, is some level of automation possible? Sure, but a 100%, transparent and accurate, hands-free setup? That could be a much harder thing to build and maintain. But there are certainly possibilities.
Here is a fun little construction that honestly took about 10 minutes of messing around. Granted, it’s a stacked deck in a small way but certainly could be used practically…
lLike you do in the video it looks nice. But how to do it?
About “many Vendors” and “Many Smart Rules” - I though there is some kind of AI what could to this. Extract the vendor name from some document. At least after learning it from a first document, by manual assignment. This AI then could automatically construct these smart-rules…
About “Technology involved” - I don’t want to do something complicated. I just want to file all my (mostly PDF) documents in a consistent way so I can find it if needed, and have good overview it is “complete” and nothing is missing, e.g. Phone Bill Jan, Feb, Mar, April … are all in same folder so I can list the folder and see if some month is missing or not, therefore i don’t like the “chaotic” organization and like more structure.
I have something in mind like “Paperless-NGX” is doing… but I don’t want to host a web-service by myself.
I think DT is really powerful but it has also some complexity..
Yes, you can make it complex, and if you want to use AI it will probably get even more complex. What you expect to happen is probably a so-called “wicked” problem. I don’t want to dissuade you on your journey, but temper expectations about it being “simple”. It’s not the tools, it’s the problem.
There have been numerous threads about that topic. Most of them solved the problem without an LLM, just using smart rules and/or scripts. If you search the forum, you’ll find many useful ideas.
The simple regular expressions but also be ware you’d need to add the appropriate business names to the second Scan Text action, e.g., when you deal with a new vendor/company.
(?i) could (should) be used in both regexes, just to cover Walmart, WALMART, or even wALmarT
Item is not Tagged and Word Count is greater than or equal to 3 are assumptions, the second being there is a text layer with at least some text on it.
The Comment field is being used as a variable so it can be used in the later Change Name action. This whole construction is to show the use of built in actions and properties that don’t necessarily require someone to know how to script.