Thank you again that explanation brought me more into the functionality of that ICU RegEx.
But I have one thing I do not understand here.
Since I want do modify all files that do not have a certain formatting, in my under standing I have to find every file that does not have a courting format I du that with:
This works not perfectly since the “" also accepts other Symbols as a correct formatting like “- + #”
I tired tu use some RegEx expressions with unicode to identify the "” but the filter didn’t accept any of that
When I understand your example correctly the string
(\d{4})(\d\d)(\d\d)(?:_?\d{4})(-.*)
Takes apart a name that looks like this
20211219_1200-Name
and reformats it to
2021_12_19_Name
That is very interesting for me since I learned ab bit mir about RegEx. But what I want to do is:
Taking any filename that ist not in the format
2021_12_19_Name
Check if there are any Numbers or symbols in front of the first letter, which is presumably the beginning of the Name and delete that. If there is nothing in front of the first letter nothing is deleted.
Next step is to prefix the creation date. Which is super simple with change name.
The parts I cannot figure out are:
Have the filter exactly watch for underscore characters
Figure out if there are numbers and/or symbols in front of the first characters (amount unknown)
Delete them or just take the part starting with the first letter over to the change name function.
I am currently reading through the ICU site to see how evaluate an undefined amount of characters.
Screams do…while but yeah
EDIT:
My current Approach looks like this but doesn’t work as intended
I told you already that you have to use regular expressions to achieve what you want. And the rule I posted should only those files matching the pattern, ie not already in the desired format.
There’s, I believe, no point in trying to match the name with something like ![…] but I’m repressing myself.
As soon as you can describe your pattern clearly and unambiguously, let me know. I can then try to figure out an expression to match.
I really don’t want to annoy you or over stress your patience. I know you are a very active and capable user here in the forum. I’ve already seen a lot of your posts here and I am very happy that you are trying to help me out.
So first of all to prevent any misunderstandings. This is what I want to achieve:
A file comes into the inbox from what source so ever and it has an unknown amount of Numbers, Symbols or Placeholders in front of the actual name. Let it be this file can also be any other combination:
34523-4523645__ 23452646*#+Name
I want to format it to
YYYY_MM_DD_Name
Where the date information is taken from the creation date of the file.
But if there is a File that has already this format YYYY_MM_DD_Name it should not be touched.
I found this solution which is not perfect, since it cannot differentiate between YYYY_MM_DD_Name and YYYY-MM-DD-Name and since doesn’t touch both files. But the YYYY-MM-DD-Name version should preferably be changed to YYYY_MM_DD_Name
You are looking for a holy grail and not going to find one.
A file comes into the inbox from what source so ever and it has an unknown amount of Numbers, Symbols or Placeholders in front of the actual name. Let it be this file can also be any other combination:
This is a conceptual error.
You need uniformity to automate things. Even exceptions - of which there shouldn’t be any or VERY few - should be eliminated, if possible.
34523-4523645__ 23452646*#+Name
I want to format it to YYYY_MM_DD_Name
Where the date information is taken from the creation date of the file.
Then what does your inquiry here have to do with the filename?
It sounds like you just want to strip out part of the filename and replace it with the creation date of the file in YYY_MM_DD format.
Also you still need to define a delimiter butween things to remove and the Name. Is it *, #, or + in your example filename?
What I am trying to achieve is to remove every nonletter in front of the first letter, which is presumably the file name at least in 99.99% of my cases. The rest I can cover manually
The rule I created is very near to what I want to achieve. There is just the problem, that the pre filtering is to generous it also protects YYYY-MM-DD-Name not just YYYY_MM_DD_Name.
If there is a way to achieve this last bit I am completely happy
You want to keep only the alphabetical part of the name and prepend it with a formatted version of the creation date.
Your perceived problem is that the pattern you use to match the files to process does not match all the files.
The last difficulty results from the “special” way DT uses to match in normal searches. There’s nothing you can do about that in this context.
What I’d do: Remove the second selection criteria. Simply change all file names to the format you want and then move them to a group. As soon as they are out of the inbox, you don’t have to worry about them anymore. Alternatively, set a tag like “processed” or whatever after you change the name and add “tag is not processed” to the conditions.
As an aside: I’d go about this very differently. Instead of looking at the file name as it arrives in the inbox, I look at the content of the document (possibly after OCRing) and determine the file name from there. I wrote about that here recently (look for “Hazel”). And then I move the file were it belongs. This, of course, is only sensible if one receives similar documents regularly – and that’s the whole point of automation. No use if I get an invoice from someone once a year, because scripting or otherwise automating that takes more time than simply changing the data by hand.
I do already have your posting for that processing with a script in my bookmarks and I am planning to work into that.
But for now I am not entirely sure what is worth being automatically renamed. Since I don’t have that many identical documents that come every month or week that are worth being archived.
The correction of the prepended date is in any case a daily annoyance for me with every document I receive.
I was used to have a very branched folder structure. I am not sure if it is that useful in DT. I was planning to start with little amount of folders and using the AI search of DT which is so highly promoted. Don’t know if this works out.
It all depends on what you’re sorting in the database(s). In my case, it is mostly financial stuff: invoices, bank statements and so on. Some of it regular (utility invoices, account statements), others not. I scripted for everything arriving at least once per month. And I use only three groups: expenses, income and banks, the latter with subgroups for each account. in addition, I used tagging, which is medium useful for me.
Not always on date of the document context. Evaluating this is also on my todo list. But first things first. This gives me at least an easy sort by name capability. What I hate the most is files mixing up in sorting because of inaccurate naming.
I want to thank @chrillek for your post on December 15, 2021. I had been searching for this very solution and found this post by searching on “Scan Name”. Your description of the expression needed to evaluate an existing Name and your reference to the DT3 documentation was invaluable. I have been a DT user since 2011 and had used a date convention in naming files that I recently wanted to change to add dashes between the year, month and day. With the help of your example, I was able to create a smart rule that does exactly what I need and works every time. Greatly appreciate your post.