[SOLVED] Need a bit of help with a simple (?) AI task - file renaming

uimike · May 3, 2025, 11:15am

Hi, I am slowly learning how to use the AI features, and could use some help with a (?) task…
I OCR’d a number of screen captures, and it would be great if I can rename the records to exactly what is in the 1st line. - check the image attached - how to go about it? Thanks!

I’d like the filename to read “1976 Dvukralove Wildlife Park”
(upload://55SJvECOFhoEotNkOiHCF1lzPYz.pdf) (184.4 KB)

chrillek · May 3, 2025, 11:41am

Please post screenshots directly in your text.

I think AI is not needed for that kind of task. Something like

scan text for regular expression ‘^(.*)$’
Set name to ‘\1’

In a smart rule should be enough.

uimike · May 3, 2025, 11:46am

Did it, thanks Christian - will try and use your suggestion, thanks!

EDIT: hm, doesn’t seem to be working?
This is the batch action I created, when I apply it, the filename does not change.

chrillek · May 3, 2025, 11:53am

Since the screenshot must come from somewhere, there might also exist a possibility to ask the original app for the name of the group/folder/whatever it uses

uimike · May 3, 2025, 11:55am

I think probably not here, Christian, as it is just an Apple capture - if I understand what you are saying.

uimike · May 3, 2025, 12:09pm

Using ScriptDebugger Explorer - I can see the following as the ‘plain text’ property of the record:

“1976 Dvurkralove Wildlife Park\n3. November WM: None Design: J. Baláz Perforation: 113/4 x 111/4\nI\n-\ny (l\nCáiMevenókí}\nOOh\nCuddlogngf...”

The first bit, up to the first new line \n is what I want - I think I could write an applescript for that (1), but using regex is fun if I can

(1) changing the delimiters to “\n” maybe

chrillek · May 3, 2025, 12:14pm

AS is overkill for that case, as is AI. And if it’s just the screenshot of a folder contents, one could perhaps set up something to capture the folder name?
I mean, if you’re going through your complete collection, automation might be helpful …

uimike · May 3, 2025, 12:18pm

Ah, no - it will be hundreds of screenshots, taken from online stamp catalogs. As it happens, I adjust the screenshot selection area to start at the stamp set title. But there will be many, many, hence the usefulness of some automation.

Can you think of a js to capture the plain text until the first return / new line?

chrillek · May 3, 2025, 12:50pm

Easy:

function performsmartrule(records) {
  records.forEach(r => {
    const txt = r.plainText();
    const newName = txt.split('\n')[0];
    r.name = newName;
  })
}

It gets the plain text attribute, splits it at newlines, which gives an array. It then takes the first element of this array (aka the first line of text) and sets that as the name.

I didn’t try that at all, though. Just wrote it down. So test it on copies of your data first!

uimike · May 3, 2025, 2:50pm

Bingo!! thank you so much Christian!!

chrillek · May 3, 2025, 2:56pm

Let’s hope that the text recognition is reliable…

uimike · May 3, 2025, 3:23pm

It’s been flawless!!
Already processed 300+ screenshots

Now a curiosity: for screenshots from some stamp catalogs, the stamp set title cannot be cleanly isolated, just because of the web page layout.

In this case, I have been going to the screenshot and cropping the original png - so as to leave the stamp title as first line in the image, before running the OCR.

The default app for png is Preview. But Preview has an annoyance. After cropping, it wants to save a copy somewhere, and invokes the Save Dialog. Of course, you should not save to the internal DEVONthink file where the original lives.

So I switched to opening the pngs with Acorn - and Acorn does replace the original file cleanly.
Then OCR, run chrillek’s script, and done.

BLUEFROG · May 3, 2025, 4:00pm

The documents are in the Global Inbox?

uimike · May 3, 2025, 4:11pm

yep.

Ah, I see - it works ok when inside some database. Got it - thanks Jim!

(and, sure, I understand why a record in the Global Inbox cannot and should not itself be modified)

BLUEFROG · May 3, 2025, 8:35pm

It’s actually an issue specific to Preview and items in the user Library. It harkens back to the same time they made the Library a hidden directory.

uimike · May 3, 2025, 11:34pm

Got it - thanks Jim

troejgaard · May 4, 2025, 1:05am

After a little testing, I’m pretty sure the m flag is not enabled by default. In that case ^(.*)$ is too greedy, as ^ and $ don’t refer to the start/end of a line, but the input as a whole.

Instead, ^(.*)\n works. (Everything from start until a newline)

Or (?m)^(.*)$. It’s not necessary here, but it’s nice to be aware of the option. (Only the first match is captured).

chrillek · May 4, 2025, 2:39pm

Interesting. I tried ^(.*)$ on a PDF with text and it gave me nothing (quite the opposite of “too greedy” )
Then, ^(.*)\n gave me the first line of the PDF (as you said).

According to the ICU regex documentation, $ should match \000a, which is the character representing \n. OTOH, they describe the effect of the m flag exactly as you do (and regex101.com shows the same behavior).

So, ^(.*)$ works for the OP, but it shouldn’t. Well.

uimike · May 4, 2025, 2:54pm

Actually, that regex did not work for me, gave me exactly what it gives you - nada.
But, as your js was so prefectly suited for my needs, I am happy using it in a batch action

I’ll play with the regex just for fun

uimike · May 4, 2025, 2:55pm

thanks troej, I’ll play with that (chrillek’s js is already working perfectly for me, but hey, why stop when the problem is solved