Renaming files according to a regex pattern

Hi. I’ve read a number of the other posts dealing with this issue, and I seem to be doing what you recommend in those other posts (as well as in the manual.

The regular expression is: (ABC\-\w{3}\-\d{8})

I.e. to match some text like ABC-JLU-08798799

I’ve tried this as a ‘batch process’ (which won’t even allow me to click past ‘ok’. It just refuses to take this.

I can set this up as a smart rule, but there I’m choosing the option ‘content matches’ and I’m unsure if this is doing anything with regex expressions or not.

I’d like to rename the files to match of that regex so I type \1 as the renaming parameter.

Neither of these options works. Can anyone explain what’s going on, or how I should be doing this better?

Thanks!

In the smart rule Actions panel select “scan text” and “regular expression.” In the next (or a following) action line, select “change name” and enter the “\1”.

Edit: I haven’t used Batch Process, but it appears to have the same selections as the smart rule…scan text—>Regular expression, Change Name

@strickvl:
Tested batch process with your example and it works. Smart Rule looks the same.


1 Like

@wmc is correct and I have confirmed it’s working here too.

Thank you @BLUEFROG and @wmc. I’m not sure what was going on previously, but it is working now. Perhaps it had something to do with the folders (documents being batch processed) containing very large numbers of documents. I have replicated this with a very small test batch now. I guess I’ll have to do this with smaller batches each time rather than 100,000+ in one go…

I guess I’ll have to do this with smaller batches each time rather than 100,000+ in one go…

Yes, this would strongly be advocated. As I’ve mentioned on these forums more than once, my father advised me, “You know how to eat an elephant? One bite at a time.”

2 Likes

I seem to have encountered the same problem even within the finder. Some operations (like deleting files) appear only to work with c. 15,000 files at a time. If you give it more than 20,000, it will just spin and spin.

From various other people and stackoverflow I think I hit some sort of memory barrier. It tries to load the entire set of files (or the list of all the names) into memory or something like that and above a certain point it’s just too many. There are other ways of batch deleting files on the terminal, but these amount to doing what you just said: you iterate through the list of files, one by one, performing the deletion of each file as you get them. I guess whatever Devonthink does with the batch process it seems like it’s more of the former than the latter.

In any case, lesson learned :wink:

Hey, I did not want to start a new topic, as I have a similar question. After building and using my Databases for several years now, I noticed a flaw in my file naming procedure. All files are named:
YYYY-MM-DD Originator - Topic.pdf

Originator can be a company, shop, authority - it can be more than one word
Originator and Topic are separated by a hyphen
Topic can be multiple words

Now this sorting is nice and useful for most folders - but I figured that sometimes it makes more sense to have it like:
Originator - YYYY-MM-DD - Topic

And this is my question, can I use RegEx to manipulate the file names? In Finder I used betterRenamer for such things, and that would be a workaround. But I would like to use more of the features in DT… any hint?

Thanks + Regards,
Nils

Yes. There’s a script in DT’s script menu under the heading “Rename” (no surprise there).
(\d{4}-\d{2}-d{2})\s+([^-]+)\s+-\s+(.*).pdf
and then replace that with
\2 - \1 - \3.pdf
No guarantee, of course. And I’m not sure about the final pdf – it’s quite possible that DT does not consider this, so you have to leave it out of the RE and the replacement string.

2 Likes

Thx for this answer - help me a lot to get started with RegEx based renaming :smiley:

Just one remark - \d{4} didn’t work for me, but rewriting it to [0-9]{4}) finally did it.
Of course would have been nice if there would have been any error / log messages pointing to why the renaming didn’t work (source regex not matching? replacement string not working? not enough / to many items??)…

1 Like

Welcome @fex

There is no known issue with \d{4}. I just tested it with no issue. However, yes the range would also work.

My goal was to reform file names formatted as YYYY_MM_DD_text to YYYMMDD_text

At first I tried (\d{4})_(\d{2})_(\d{2})_(.*)\1\2\3_\4 - and my file names didn’t change at all.

After changing it to ([0-9]{4})_([0-9]{2})_([0-9]{2})_(.*)\1\2\3_\4 my file names finally changed…

So for me it looks like the \d syntax was at fault - but I would be interested to hear what else it could be…

EDIT: DEVONthink V3.8

1 Like

And please ignore the typo in the filename in the RTFD :stuck_out_tongue: The actual filename is correctly shown in the window’s title bar.

Interesting :thinking:
Could that have anything to do with using the Rename with RegEx script vs Smart Rules?

Yes.
Were you using the script?

Yes

This script uses the command line utility sed. Since I have no access to my Mac, I can’t check right now, but it’s possible that this tool still does not understand the newer (like more than 20 years old) syntax. Apple is not known for updating the GNU utilities often.

2 Likes

I will confirm that my Mac Mini M1 doesn’t understand the \d.

For regex, I really miss RegexBuddy on Windows.
Haven’t found any tools so nice as that on a Mac.
It was fantastic.

Thanks for the tip to use [0-9]. That works.

I use youtube-dl

youtube-dl --get-id 'PasteChannelOrPlaylistOrVideoHttpsAddressHere'| xargs -I '{}' -P 350 youtube-dl 'https://youtube.com/watch?v={}' -o '%(uploader)s (%(uploader_id)s)/%(upload_date)s YT %(title)s - (%(duration)ss) [%(resolution)s] [%(id)s]' --write-auto-sub --convert-subs=srt --write-description --write-info-json -f 'bestvideo[ext=mp4]+bestaudio[ext=m4a]/mp4'

The -P 350 will attempt to download up to 350 videos simultaneously, not that you will need that many. Make it 1 if you want one at a time. The line above gets the subtitles, description, video, and all the json info. I have some other scripts that parse and combine what I want to keep from that stuff… So for the mp4, I get this:

20171221 YT VideoTitle - (35s) [640x360] [VideoAddressCode]
and it gets dates as yyyymmdd

I prefer yyyy-mm-dd.
So the script Rename using RegEx works nicely.
Select the files you want to rename.
Run the script.
Source: ([0-9]{4})([0-9]{2})([0-9]{2})
Destination: \1-\2-\3

2017-12-21 YT VideoTitle - (35s) [640x360] [VideoAddressCode]

Done. Oh, and why do I not just rename the file during the download? It’s because youtube-dl will skip previously downloaded files if the name exists. So when I put the videos into DT3 for annotation, I rename them there and I keep the original download folder handy in case I need to refresh it and I don’t want duplicates so I this prevents a double download situation.

Thank you.

Have you tried Mark Alldritt’s RegexKnife. (He’s the guy behind Script Debugger.)

It’s a Mac Catalyst app and not verified for macOS but it downloads from the Mac App Store and runs with no apparent issue.

In this generality, the statement is false. In fact, sed on macOS (regardless of the processor) does not support \d (and presumably the other backslash shortcuts like \s, \Detc.)
And since the script “Rename using Regex” uses sed, it doesn’t work with \d. Which is a shame, to use a polite term.

Here’s a JavaScript version of the same script. It doesn’t rely on sed so that you can use \d etc. freely.

const = Application('DEVONthink 3');
app.includeStandardAdditions = true;

const sel = app.selection();

if (sel.length > 0) {
		const searchFor = app.displayDialog('Search for RE', { defaultAnswer: "" });
		if (searchFor.textReturned === '') exit;
		const replaceWith = app.displayDialog(searchFor.textReturned + 
		    '\nReplace with', { defaultAnswer: ""});
		const re = new RegExp(searchFor.textReturned);
        const reText = replaceWith.textReturned;
		sel.forEach( el => {
			el.name = el.name().replace(re,reText);
		});
} else {
  app.displayAlert("Select at least one record.");
}

Other than the original script, it also displays the RE in the replacement dialog. Note that in JavaScript, you refer to capturing groups with a $, like $1, $2, $3 etc.

But if you’re after an automated process, I’d suggest incorporating that script in a smart rule like so:

function performsmartrule(records) {
  const app = Application("DEVONthink 3");
  app.selectedRecords.forEach(r => {
    r.name = r.name().replace(/(\d{4})(\d\d)(\d\d)/,"$1-$2-$3");
  })
}