[SOLVED] Need a bit of help with a simple (?) AI task - file renaming

Hi, I am slowly learning how to use the AI features, and could use some help with a (?) task…
I OCR’d a number of screen captures, and it would be great if I can rename the records to exactly what is in the 1st line. - check the image attached - how to go about it? Thanks!

I’d like the filename to read “1976 Dvukralove Wildlife Park”
(upload://55SJvECOFhoEotNkOiHCF1lzPYz.pdf) (184.4 KB)

Please post screenshots directly in your text.

I think AI is not needed for that kind of task. Something like

  • scan text for regular expression ‘^(.*)$’
  • Set name to ‘\1’

In a smart rule should be enough.

1 Like

Did it, thanks Christian - will try and use your suggestion, thanks!

EDIT: hm, doesn’t seem to be working?
This is the batch action I created, when I apply it, the filename does not change.

Since the screenshot must come from somewhere, there might also exist a possibility to ask the original app for the name of the group/folder/whatever it uses

I think probably not here, Christian, as it is just an Apple capture - if I understand what you are saying.

Using ScriptDebugger Explorer - I can see the following as the ‘plain text’ property of the record:

“1976 Dvurkralove Wildlife Park\n3. November WM: None Design: J. Baláz Perforation: 113/4 x 111/4\nI\n-\ny (l\nCáiMevenókí}\nOOh\nCuddlogngf...”

The first bit, up to the first new line \n is what I want - I think I could write an applescript for that (1), but using regex is fun if I can :slight_smile:

(1) changing the delimiters to “\n” maybe

AS is overkill for that case, as is AI. And if it’s just the screenshot of a folder contents, one could perhaps set up something to capture the folder name?
I mean, if you’re going through your complete collection, automation might be helpful …

1 Like

Ah, no - it will be hundreds of screenshots, taken from online stamp catalogs. As it happens, I adjust the screenshot selection area to start at the stamp set title. But there will be many, many, hence the usefulness of some automation.

Can you think of a js to capture the plain text until the first return / new line?

Easy:

function performsmartrule(records) {
  records.forEach(r => {
    const txt = r.plainText();
    const newName = txt.split('\n')[0];
    r.name = newName;
  })
}

It gets the plain text attribute, splits it at newlines, which gives an array. It then takes the first element of this array (aka the first line of text) and sets that as the name.

I didn’t try that at all, though. Just wrote it down. So test it on copies of your data first!

3 Likes

Bingo!! thank you so much Christian!!


Let’s hope that the text recognition is reliable…

1 Like

It’s been flawless!!
Already processed 300+ screenshots

Now a curiosity: for screenshots from some stamp catalogs, the stamp set title cannot be cleanly isolated, just because of the web page layout.

In this case, I have been going to the screenshot and cropping the original png - so as to leave the stamp title as first line in the image, before running the OCR.

The default app for png is Preview. But Preview has an annoyance. After cropping, it wants to save a copy somewhere, and invokes the Save Dialog. Of course, you should not save to the internal DEVONthink file where the original lives.

So I switched to opening the pngs with Acorn - and Acorn does replace the original file cleanly.
Then OCR, run chrillek’s script, and done.

The documents are in the Global Inbox?

1 Like

yep.

Ah, I see - it works ok when inside some database. Got it - thanks Jim!

(and, sure, I understand why a record in the Global Inbox cannot and should not itself be modified)

It’s actually an issue specific to Preview and items in the user Library. It harkens back to the same time they made the Library a hidden directory.

1 Like

Got it - thanks Jim

After a little testing, I’m pretty sure the m flag is not enabled by default. In that case ^(.*)$ is too greedy, as ^ and $ don’t refer to the start/end of a line, but the input as a whole.

Instead, ^(.*)\n works. (Everything from start until a newline)

Or (?m)^(.*)$. It’s not necessary here, but it’s nice to be aware of the option. (Only the first match is captured).

Interesting. I tried ^(.*)$ on a PDF with text and it gave me nothing (quite the opposite of “too greedy” :wink: )
Then, ^(.*)\n gave me the first line of the PDF (as you said).

According to the ICU regex documentation, $ should match \000a, which is the character representing \n. OTOH, they describe the effect of the m flag exactly as you do (and regex101.com shows the same behavior).

So, ^(.*)$ works for the OP, but it shouldn’t. Well.

Actually, that regex did not work for me, gave me exactly what it gives you - nada.
But, as your js was so prefectly suited for my needs, I am happy using it in a batch action :slight_smile:

I’ll play with the regex just for fun

thanks troej, I’ll play with that (chrillek’s js is already working perfectly for me, but hey, why stop when the problem is solved :wink: