Searching for non-ASCII text

Hi, is there a way to find documents containing non-ASCII text? I’m trying to filter all documents that contain Ethiopic script - Unicode range1200–137F.

None I’m aware of.
Do you have an example document you could ZIP and post?

Thanks. I’d found this, though it clearly applies just to InDesign, though I wondered if there might be a similar mechanism in DT:

Here is a link to a zipped RTF containing Ethiopic script.

Could it be scripted? Something along the lines of using Regex to search for characters Unicode range1200–137F in plain text of the document and then tag the document if such an occurrence is found?

It could:

function performsmartrule(recs) {
  let app=Application("DEVONthink 3");
  recs.forEach(r => {
	if (/[\u{1200}–\u{137F}]/.test(r.plainText())) {
	  let tags = r.tags();
	  tags.push("ethiopian");
	  r.tags = tags;
	}
  })
}

(() => {
  let app= Application("DEVONthink 3");
  performsmartrule(app.selectedRecords());
})()

I’m sure that can also be done in AppleScript with the usual ObjC bridge. And the whole tag-setting operation might be too convoluted here, but I don’t have time to make it nicer :wink:

2 Likes

I thought it might :wink: thanks @chrillek

@pointeast all you need to do is run the script on your documents (test it with a subgroup first) using a smart rule and set up a smart group which looks for documents tagged “ethiopian” and you’re good to go.

Many many thanks for your help but struggling a bit establishing the smart rule. This is JavaScript right?

Yep. As I said, one can achieve the same result with AppleScript, writing a lot more code.
You might want to search for a post by @pete31 on the topic “regular expression” if you feel more comfortable with AppleScript. His code uses AppleScript-ObjC.

The result does not depend on the scripting language, though.

Still trying to figure out how to get it working. But thank you both so much nonetheless!

Did you read the documentation on “Automation”, especially the part on “external scripts” in the smart rule section? What it says applies to all smart rule scripts, regardless of the language they’re written in.

If you still have problems, please describe them in detail: what did you try, what was the outcome. Otherwise it’s impossible to give any advice.

We’ll get you there :slight_smile: As @chrillek wrote, let us know what you’re doing (screenshots really help) and the forum will converge on you with suggestions :slight_smile:

It wasn’t recognising anything until I changed “recs” to “records” in the code, which then recognised the one file in the test folder with Ethiopic script and tagged it accordingly. But when I then tried to run it again it failed to recognise any others.

And still getting “Error: Error: No Error” log.

Does that make any sense?

n.b. I have very limited coding knowledge (if that wasn’t already clear).

Ah, the Error No Error conundrum. We need the help of the resident JS expert - @chrillek help!

Oh, and please post a copy of the smart rule you were using to trigger the code - or if you weren’t using a smart rule, let us know how you were running the code :slight_smile: cheers

So you were using it in a smart rule? Please, make clear what you’re doing, let’s not guess.
The script worked here stand alone in the script editor as it is – no need to change the variable names. Even given the somewhat erratic behaviour of JavaScript with DT, I doubt that changing recs to records is necessary at all.
But since you didn’t post any more details…

This is normal not-normal behaviour. As I said: JS and DT behave a bit erratically together.
Anyway: You need to be more forthcoming with information, otherwise nobody can help you.

It should also work outside of a smart rule: Select the records you want to process and run the script on them. You have to install it in DT’s scripting directory first, as described in the documentation.

Yes, I was using it as a smart rule.

Script created in Script Editor (back to “recs”), and saved in Smart Rules folder:

I just tried running it as a stand alone script — having moved the script into scripting directory — on a number of selected files and it applied the tag to all files (i.e. not just those containing Ethiopic script).

Apologies for former lack of clarity…

Indeed. Weird. I had tried it only on a file with ethiopian characters…

New version, tested with latin and ethiopan script and only tagging the latter. Interestingly, there’d be an easier regular expression, namely /\p{sc=Ethiopic}/. But that throws an error, probably because of Apple’s JavaScript implementation.

function performsmartrule(recs) {
  let app=Application("DEVONthink 3");
  const re = /[\u{1200}-\u{137F}]/u;
  recs.forEach(r => {
	if (re.test(r.plainText())) {
	  let tags = r.tags();
	  tags.push("ethiopian");
	  r.tags = tags;
	}
  })
}

(() => {
  let app= Application("DEVONthink 3");
  performsmartrule(app.selectedRecords());
})()

As to your smart rule: Please do not select “any documents” but only those with a word count > 0.

1 Like

Done! Thank you!