Smart Rule wildcards

It is basically the rule as mentioned above.

Is there a documentation of all the functions that are available in smart rules and what operators are available i.e. checking for letters, special symbols etc.

The help only describes what the function does but not how to make them doing that.

You’re using a regular expression in a search. That does not work, you have to use a scan rule.
The normal search does not recognize regular expressions.

Ah okay, could you please show me an example of what you mean I am new to Devonthink and right know I can’t figure it out.

My rule actually finds everything that doesn’t start with 2021_12_19_ or 2021-12-19- format

I would like narrow it down to really operate everything that doesn’t start with 2021_12_19_[letter] so using underscores and having a letter after the date.

Next step would be to to delete everything before the first real letter and replace it with the date.

Did you read the documentation on smart rules? The usage of scan text/name is explained there.

2 Likes

I checked the help file there the function of scan name with string, date etc is mentioned. But no syntax how to use it. At least not that I was able to figure it out.

I am really not trying to play the lazy card here I would love to understand and do it myself.

I have to say I am not that experienced in programming of any kind. I have just very basic knowledge in that field.

OK, I’m quotng the relevant parts from the documentation.

Item scanning: The next two actions allow you to scan the name or text of a document and use the results when found. …

  • Scan Name: Scans the name of the file.

    The following four parameters are used with the Scan Name and Scan Text actions. …
  • Regular Expression: Items in parentheses are captured; items outside parentheses are ignored. You can specify multiple captures in an expression. Using the captured text in subsequent actions is specified by using backslash, \, and the number of the capture, starting at 1. Note we use Apple’s NSRegularExpression which supports the ICU regular expression syntax.

So you have to use a regular expression, because it is not possible otherwise to refer to parts of the match.
The expression itself should look like this
(\d{4})(\d\d)(\d\d)(?:_?\d{4})(-.*)
and the replacement like this
\1_\2_\3\4

Explanation: \d is RegEx’ish for [0-9], think of “digits”. So we’re first capturing 4, 2, and another 2 digits in the capturing groups (delimited by parenthesis) 1, 2 and 3. Then comes a non-capturing group (?:...). It looks for an optional (?) underscore, followed by four digits (\d{4}). The last capturing group is (-.*), i.e. the “-Name” part in your example. This group is # 4, since non-capturing groups are not counted.

In the replacement part, you simply string together the capturing groups 1 through 4 with the underscores you want.

Since the rule only matches the file names you described, it should do what you want. Please test first with a copies! And adjust the rule accordingly if the part after the date is not four digits…

1 Like

Not in criteria, it’s not possible at this time.

In the Scan Text or Scan Name smart rule action, it is when you use a Regular Expression method. However, the smart rule would have to identify the matching files first.

1 Like

Thank you very much I will try it out. That really is something new I didn’t find. I will also check the help again to see where I missed what you posted.

BR AWD

It is in the link to the ICU definition.

Thanks for the confirmation!

1 Like

Thank you again that explanation brought me more into the functionality of that ICU RegEx.

But I have one thing I do not understand here.

Since I want do modify all files that do not have a certain formatting, in my under standing I have to find every file that does not have a courting format I du that with:

![0-9][0-9][0-9][0-9]_[0-9][0-9]_[0-9][0-9]_[a-z]*

This works not perfectly since the “" also accepts other Symbols as a correct formatting like “- + #”
I tired tu use some RegEx expressions with unicode to identify the "
” but the filter didn’t accept any of that

When I understand your example correctly the string

(\d{4})(\d\d)(\d\d)(?:_?\d{4})(-.*)

Takes apart a name that looks like this

20211219_1200-Name

and reformats it to

2021_12_19_Name

That is very interesting for me since I learned ab bit mir about RegEx. But what I want to do is:
Taking any filename that ist not in the format

2021_12_19_Name

Check if there are any Numbers or symbols in front of the first letter, which is presumably the beginning of the Name and delete that. If there is nothing in front of the first letter nothing is deleted.

Next step is to prefix the creation date. Which is super simple with change name.

The parts I cannot figure out are:

  • Have the filter exactly watch for underscore characters
  • Figure out if there are numbers and/or symbols in front of the first characters (amount unknown)
  • Delete them or just take the part starting with the first letter over to the change name function.

I am currently reading through the ICU site to see how evaluate an undefined amount of characters.
Screams do…while but yeah :wink:

EDIT:

My current Approach looks like this but doesn’t work as intended

I told you already that you have to use regular expressions to achieve what you want. And the rule I posted should only those files matching the pattern, ie not already in the desired format.

There’s, I believe, no point in trying to match the name with something like ![…] but I’m repressing myself.

As soon as you can describe your pattern clearly and unambiguously, let me know. I can then try to figure out an expression to match.

I really don’t want to annoy you or over stress your patience. I know you are a very active and capable user here in the forum. I’ve already seen a lot of your posts here and I am very happy that you are trying to help me out.

So first of all to prevent any misunderstandings. This is what I want to achieve:

A file comes into the inbox from what source so ever and it has an unknown amount of Numbers, Symbols or Placeholders in front of the actual name. Let it be this file can also be any other combination:

34523-4523645__ 23452646*#+Name

I want to format it to

YYYY_MM_DD_Name

Where the date information is taken from the creation date of the file.

But if there is a File that has already this format YYYY_MM_DD_Name it should not be touched.

I found this solution which is not perfect, since it cannot differentiate between YYYY_MM_DD_Name and YYYY-MM-DD-Name and since doesn’t touch both files. But the YYYY-MM-DD-Name version should preferably be changed to YYYY_MM_DD_Name

My partial achievement looks like this:

If you have the mood to show me a better solution I would be happy to learn

You are looking for a holy grail and not going to find one.

A file comes into the inbox from what source so ever and it has an unknown amount of Numbers, Symbols or Placeholders in front of the actual name. Let it be this file can also be any other combination:

This is a conceptual error.
You need uniformity to automate things. Even exceptions - of which there shouldn’t be any or VERY few - should be eliminated, if possible.

34523-4523645__ 23452646*#+Name
I want to format it to
YYYY_MM_DD_Name
Where the date information is taken from the creation date of the file.

Then what does your inquiry here have to do with the filename?
It sounds like you just want to strip out part of the filename and replace it with the creation date of the file in YYY_MM_DD format.

Also you still need to define a delimiter butween things to remove and the Name. Is it *, #, or + in your example filename?

1 Like

Hi Bluefrog,

What I am trying to achieve is to remove every nonletter in front of the first letter, which is presumably the file name at least in 99.99% of my cases. The rest I can cover manually

The rule I created is very near to what I want to achieve. There is just the problem, that the pre filtering is to generous it also protects YYYY-MM-DD-Name not just YYYY_MM_DD_Name.

If there is a way to achieve this last bit I am completely happy

I’ll try to summarize in my own words:

  • You have files coming into inbox
  • They can have “any name”
  • You want to keep only the alphabetical part of the name and prepend it with a formatted version of the creation date.
  • Your perceived problem is that the pattern you use to match the files to process does not match all the files.

The last difficulty results from the “special” way DT uses to match in normal searches. There’s nothing you can do about that in this context.

What I’d do: Remove the second selection criteria. Simply change all file names to the format you want and then move them to a group. As soon as they are out of the inbox, you don’t have to worry about them anymore. Alternatively, set a tag like “processed” or whatever after you change the name and add “tag is not processed” to the conditions.

As an aside: I’d go about this very differently. Instead of looking at the file name as it arrives in the inbox, I look at the content of the document (possibly after OCRing) and determine the file name from there. I wrote about that here recently (look for “Hazel”). And then I move the file were it belongs. This, of course, is only sensible if one receives similar documents regularly – and that’s the whole point of automation. No use if I get an invoice from someone once a year, because scripting or otherwise automating that takes more time than simply changing the data by hand.

2 Likes

Yes that is correct I’ve now modified my rule inspired by your idea of simply processing everything.

I’ve split it now into 3 rules that run after each other. I guess it is not the most sophisticated solution but for now it it does what I want.



I do already have your posting for that processing with a script in my bookmarks and I am planning to work into that.
But for now I am not entirely sure what is worth being automatically renamed. Since I don’t have that many identical documents that come every month or week that are worth being archived.
The correction of the prepended date is in any case a daily annoyance for me with every document I receive.

I was used to have a very branched folder structure. I am not sure if it is that useful in DT. I was planning to start with little amount of folders and using the AI search of DT which is so highly promoted. Don’t know if this works out.

It all depends on what you’re sorting in the database(s). In my case, it is mostly financial stuff: invoices, bank statements and so on. Some of it regular (utility invoices, account statements), others not. I scripted for everything arriving at least once per month. And I use only three groups: expenses, income and banks, the latter with subgroups for each account. in addition, I used tagging, which is medium useful for me.

Thank you for the tips.
That kind of documents was also on my list. Insurances, financial, house related things, family related.

In front of what “first letter”?

And you said you want to use the creation date of the file. Is that accurate?