Smart Rule wildcards

AWD · December 19, 2021, 8:15pm

Thank you again that explanation brought me more into the functionality of that ICU RegEx.

But I have one thing I do not understand here.

Since I want do modify all files that do not have a certain formatting, in my under standing I have to find every file that does not have a courting format I du that with:

![0-9][0-9][0-9][0-9]_[0-9][0-9]_[0-9][0-9]_[a-z]*

This works not perfectly since the “" also accepts other Symbols as a correct formatting like “- + #”
I tired tu use some RegEx expressions with unicode to identify the "” but the filter didn’t accept any of that

When I understand your example correctly the string

(\d{4})(\d\d)(\d\d)(?:_?\d{4})(-.*)

Takes apart a name that looks like this

20211219_1200-Name

and reformats it to

2021_12_19_Name

That is very interesting for me since I learned ab bit mir about RegEx. But what I want to do is:
Taking any filename that ist not in the format

2021_12_19_Name

Check if there are any Numbers or symbols in front of the first letter, which is presumably the beginning of the Name and delete that. If there is nothing in front of the first letter nothing is deleted.

Next step is to prefix the creation date. Which is super simple with change name.

The parts I cannot figure out are:

Have the filter exactly watch for underscore characters
Figure out if there are numbers and/or symbols in front of the first characters (amount unknown)
Delete them or just take the part starting with the first letter over to the change name function.

I am currently reading through the ICU site to see how evaluate an undefined amount of characters.
Screams do…while but yeah

EDIT:

My current Approach looks like this but doesn’t work as intended

chrillek · December 19, 2021, 9:59pm

I told you already that you have to use regular expressions to achieve what you want. And the rule I posted should only those files matching the pattern, ie not already in the desired format.

There’s, I believe, no point in trying to match the name with something like ![…] but I’m repressing myself.

As soon as you can describe your pattern clearly and unambiguously, let me know. I can then try to figure out an expression to match.

AWD · December 19, 2021, 10:35pm

I really don’t want to annoy you or over stress your patience. I know you are a very active and capable user here in the forum. I’ve already seen a lot of your posts here and I am very happy that you are trying to help me out.

So first of all to prevent any misunderstandings. This is what I want to achieve:

A file comes into the inbox from what source so ever and it has an unknown amount of Numbers, Symbols or Placeholders in front of the actual name. Let it be this file can also be any other combination:

34523-4523645__ 23452646*#+Name

I want to format it to

YYYY_MM_DD_Name

Where the date information is taken from the creation date of the file.

But if there is a File that has already this format YYYY_MM_DD_Name it should not be touched.

I found this solution which is not perfect, since it cannot differentiate between YYYY_MM_DD_Name and YYYY-MM-DD-Name and since doesn’t touch both files. But the YYYY-MM-DD-Name version should preferably be changed to YYYY_MM_DD_Name

My partial achievement looks like this:

If you have the mood to show me a better solution I would be happy to learn

BLUEFROG · December 19, 2021, 10:46pm

You are looking for a holy grail and not going to find one.

A file comes into the inbox from what source so ever and it has an unknown amount of Numbers, Symbols or Placeholders in front of the actual name. Let it be this file can also be any other combination:

This is a conceptual error.
You need uniformity to automate things. Even exceptions - of which there shouldn’t be any or VERY few - should be eliminated, if possible.

34523-4523645__ 23452646*#+Name
I want to format it to
YYYY_MM_DD_Name
Where the date information is taken from the creation date of the file.

Then what does your inquiry here have to do with the filename?
It sounds like you just want to strip out part of the filename and replace it with the creation date of the file in YYY_MM_DD format.

Also you still need to define a delimiter butween things to remove and the Name. Is it *, #, or + in your example filename?

AWD · December 19, 2021, 11:09pm

Hi Bluefrog,

What I am trying to achieve is to remove every nonletter in front of the first letter, which is presumably the file name at least in 99.99% of my cases. The rest I can cover manually

The rule I created is very near to what I want to achieve. There is just the problem, that the pre filtering is to generous it also protects YYYY-MM-DD-Name not just YYYY_MM_DD_Name.

If there is a way to achieve this last bit I am completely happy

chrillek · December 20, 2021, 8:33am

I’ll try to summarize in my own words:

You have files coming into inbox
They can have “any name”
You want to keep only the alphabetical part of the name and prepend it with a formatted version of the creation date.
Your perceived problem is that the pattern you use to match the files to process does not match all the files.

The last difficulty results from the “special” way DT uses to match in normal searches. There’s nothing you can do about that in this context.

What I’d do: Remove the second selection criteria. Simply change all file names to the format you want and then move them to a group. As soon as they are out of the inbox, you don’t have to worry about them anymore. Alternatively, set a tag like “processed” or whatever after you change the name and add “tag is not processed” to the conditions.

As an aside: I’d go about this very differently. Instead of looking at the file name as it arrives in the inbox, I look at the content of the document (possibly after OCRing) and determine the file name from there. I wrote about that here recently (look for “Hazel”). And then I move the file were it belongs. This, of course, is only sensible if one receives similar documents regularly – and that’s the whole point of automation. No use if I get an invoice from someone once a year, because scripting or otherwise automating that takes more time than simply changing the data by hand.

AWD · December 20, 2021, 2:43pm

Yes that is correct I’ve now modified my rule inspired by your idea of simply processing everything.

I’ve split it now into 3 rules that run after each other. I guess it is not the most sophisticated solution but for now it it does what I want.

I do already have your posting for that processing with a script in my bookmarks and I am planning to work into that.
But for now I am not entirely sure what is worth being automatically renamed. Since I don’t have that many identical documents that come every month or week that are worth being archived.
The correction of the prepended date is in any case a daily annoyance for me with every document I receive.

I was used to have a very branched folder structure. I am not sure if it is that useful in DT. I was planning to start with little amount of folders and using the AI search of DT which is so highly promoted. Don’t know if this works out.

chrillek · December 20, 2021, 3:21pm

It all depends on what you’re sorting in the database(s). In my case, it is mostly financial stuff: invoices, bank statements and so on. Some of it regular (utility invoices, account statements), others not. I scripted for everything arriving at least once per month. And I use only three groups: expenses, income and banks, the latter with subgroups for each account. in addition, I used tagging, which is medium useful for me.

AWD · December 20, 2021, 3:31pm

Thank you for the tips.
That kind of documents was also on my list. Insurances, financial, house related things, family related.

BLUEFROG · December 20, 2021, 4:07pm

In front of what “first letter”?

And you said you want to use the creation date of the file. Is that accurate?

AWD · December 20, 2021, 4:31pm

The first letter of the in the file name

132452345_234234-image.jpg

I am deleting everything before the I

Not always on date of the document context. Evaluating this is also on my todo list. But first things first. This gives me at least an easy sort by name capability. What I hate the most is files mixing up in sorting because of inaccurate naming.

tdmayca · February 6, 2022, 7:32pm

I want to thank @chrillek for your post on December 15, 2021. I had been searching for this very solution and found this post by searching on “Scan Name”. Your description of the expression needed to evaluate an existing Name and your reference to the DT3 documentation was invaluable. I have been a DT user since 2011 and had used a date convention in naming files that I recently wanted to change to add dashes between the year, month and day. With the help of your example, I was able to create a smart rule that does exactly what I need and works every time. Greatly appreciate your post.

BLUEFROG · February 7, 2022, 1:18pm

Welcome @tdmayca

We are glad to have you here and pleased the forums provided helpful info to help resolve your issue