Special REGEX in Batch processing makes DEVONthink inoperable

Hello there,

I tried this REGEX in batch

Scan Text [#\*Xx]{4,12}(\d{4})|(BAR)|(Bar)
Change Name %recordName%___\1\2\3

When I tried to let it run the Dialog window get’s stuck and nothing happens anymore:
CleanShot 2020-07-16 at 19.03.53

I have to force-quit DEVONthink then :cry:

(EDIT: Bug only appears, if the document is valid to the REGEX)

The attentive reader will see that I have made a beginners mistake there and NOT escaped the Asterisk ( * ). Of course the first bracket should look like this [#\*Xx] to make things work - but it does not work either way…

I’m not seeing any issue here.

  • What is a name that’s supposed to match?

  • Why are you entering %recordName% ?

Looping?

The issue is that I have to Force quit DEVONthink? How could that not be an issue? :sweat_smile:
%recordName%just got there through copying.

I meant I am not able to reproduce the issue here yet.

It’s interesting I tried it again on some files and it seems only to happen on some files. I’ve checked the file names and they are very similar (including # and ) - It definitely only has the issue if the REGEX finds a true value.

I renamed the file to 2019.pdf now - so should work fine.
Still the same. I also tried it with ((BAR)|Bar|gegeben) - No luck. Should I send the file to you?

I’m not sure but I suggest that you rethink your capturing groups. If I understand correctly, you’re looking for

  • between 4 and 10 occurrences of #, *, x or X
  • followed by
    • either exactly 4 digits (which you capture in group 1)
    • or the word BAR (which again you capture in group 1)
    • or the word “Bar”, which you capture again in group 1

In any case, there can only be one captured group which contains either four digits or BAR or Bar.
Disclaimer: I may be wrong about the first alternation, which could also contain the 4-10 characters from [#*xX].

Hint: Use only \1 in your replacement string. You have three alternations (|) only one of which can be true. So there’s only one reference set, not three.

Hint: Use /i (if possible at all) in your regex if you don’t care for capitalization.

Last Hint: Use one of the internet tools to check your regex and possibly the replacement string. If that’s not possible, test it with Perl/Python/JavaScript. That’s usually simpler than throwing it at DT which doesn’t tell you why it is unhappy.

Suggestion: Don’t stuff all and everything into your filename. You can use tags, (user) meta data fields and so on for additional information. It might be more helpful to keep the names short, meaningful and easy to read and put related information (like amount, kind of payment) into the meta data (in the broad sense).

2 Likes

Which version of macOS do you use? In addition, please choose Help > Report Bug while pressing the Alt modifier key and send the result to cgrunenberg - at - devon-technologies.com - maybe the logs contain any hints. thanks!

MacBook Pro (16 Zoll, 2019)
with Catalina 10.15.5
and DEVONthink 3.5.1

just sent the rest to you :slight_smile: Thanks!!!

Please tell me if I’m wrong, but the replacement string in the GIF references to the first formula which is, so as far as I know I should have three references:

[#\*Xx]{4,12}(\d{4})|(BAR)|(Bar)
-------------^FIRST^-^2ND^-^3rd^

Am I right? So if I want to show the one that is found I have to reference all of them with
%recordName%___\1\2\3

Or did I get the concept wrong here?

This I did of course (after having to manually rename 12 PDFs, as I found out that there’s no Undo for Batch-Processing - would be a #feature-request maybe :laughing:) I own Expressions https://www.apptorium.com/expressions and I test EVERY regex before I throw it onto my files with either all the filenames or some files contents (after converting to PDF+Text)

Ah see, I didn’t know that, yet! Thanks man :smile::+1:

I unfortunately have to because most of the documents are for Tax purpuse and not all of the META is readable with the different apps that my docs have to be pushed through. - So I found this would be the most conservative and convenient way.

I also wrote an expander regex that formats
Car+190314+48.29+7812+Gas Station NYC .pdf
to
2019— #Car $48.29 – Gas Station NYC – #CreditCard7812 —2019-03-14.pdf
That increased manual renaming speed (Sometimes OCR doesn’t kick in at all and I don’t know why…) by 3 times. Then after I’m done with all files I use the script I found here to take the date out of the filename and save as creation date and then go to Use tags of filename (I don’t know the exact english denomination) and I have the tags Car and CreditCard7812 which I love :heart:

Yes. The alternation (|) can only be true for one alternative, processing stops after the first match.

Consider the regex /(a)|(b)|(c)/ and the “bar”. The Regex engine first looks at the “b” and compares it to (a) - no match. Then it looks at the “b” and compares it to “b” - match. Since there was no match, this parenthesis become the FIRST capturing group. See here. Consequently, in your case there’s never anything else but a capturing group 1. You might have been thinking of nested capturing groups like so
(\d{4}(Bar|bar)). In this case, “2019Bar” is \1, and “Bar” is \2.
Unfortunately, the power of REs comes with a lot of pitfalls.

I apologize, I didn’t want to sound condescending. Last time I was playing around with REs, I found the online tools useful, but in your case, it’s probably the replacement string that is causing the trouble.

As to the file names: As I said in another thread, depending on your tax system, you could perhaps get by with a table generated in DT that contains the necessary data (like record name, amount, purpose, whatnot) in separate columns. I had posted a script for that in the forum’s automation section some weeks ago. So you can use Meta inside DT and just massage the data a bit before exporting it to other apps. Maybe.

No, no! Definitely no need to apologize! The mentioning of the online tools is great as you can’t know how I work with my files. If I hadn’t Expressions (because I always wanted to learn more regex but always skipped) the online tools would be awesome. And If I hadn’t known about them I’d be very lucky!!
And - by the way - really: Thanks for your time you spend for helping with my issues - really appreciate that! :four_leaf_clover:

What was the original file name causing the issue?

That was just a test - the file name did not cause the issue - it was caused by the content and the Batch-Processing (maybe false REGEX or false reference in Name change - still no clue)