Hi, I am slowly learning how to use the AI features, and could use some help with a (?) task…
I OCR’d a number of screen captures, and it would be great if I can rename the records to exactly what is in the 1st line. - check the image attached - how to go about it? Thanks!
I’d like the filename to read “1976 Dvukralove Wildlife Park”
(upload://55SJvECOFhoEotNkOiHCF1lzPYz.pdf) (184.4 KB)
Since the screenshot must come from somewhere, there might also exist a possibility to ask the original app for the name of the group/folder/whatever it uses
AS is overkill for that case, as is AI. And if it’s just the screenshot of a folder contents, one could perhaps set up something to capture the folder name?
I mean, if you’re going through your complete collection, automation might be helpful …
Ah, no - it will be hundreds of screenshots, taken from online stamp catalogs. As it happens, I adjust the screenshot selection area to start at the stamp set title. But there will be many, many, hence the usefulness of some automation.
Can you think of a js to capture the plain text until the first return / new line?
It gets the plain text attribute, splits it at newlines, which gives an array. It then takes the first element of this array (aka the first line of text) and sets that as the name.
I didn’t try that at all, though. Just wrote it down. So test it on copies of your data first!
It’s been flawless!!
Already processed 300+ screenshots
Now a curiosity: for screenshots from some stamp catalogs, the stamp set title cannot be cleanly isolated, just because of the web page layout.
In this case, I have been going to the screenshot and cropping the original png - so as to leave the stamp title as first line in the image, before running the OCR.
The default app for png is Preview. But Preview has an annoyance. After cropping, it wants to save a copy somewhere, and invokes the Save Dialog. Of course, you should not save to the internal DEVONthink file where the original lives.
So I switched to opening the pngs with Acorn - and Acorn does replace the original file cleanly.
Then OCR, run chrillek’s script, and done.
After a little testing, I’m pretty sure the m flag is not enabled by default. In that case ^(.*)$ is too greedy, as ^ and $ don’t refer to the start/end of a line, but the input as a whole.
Instead, ^(.*)\n works. (Everything from start until a newline)
Or (?m)^(.*)$. It’s not necessary here, but it’s nice to be aware of the option. (Only the first match is captured).
Interesting. I tried ^(.*)$ on a PDF with text and it gave me nothing (quite the opposite of “too greedy” )
Then, ^(.*)\n gave me the first line of the PDF (as you said).
According to the ICU regex documentation, $ should match \000a, which is the character representing \n. OTOH, they describe the effect of the m flag exactly as you do (and regex101.com shows the same behavior).
So, ^(.*)$ works for the OP, but it shouldn’t. Well.
Actually, that regex did not work for me, gave me exactly what it gives you - nada.
But, as your js was so prefectly suited for my needs, I am happy using it in a batch action