Script to sanitise file names

I remember we’ve discussed this here, but I’m not able to find the thread.

Idea is have a script to remove any no letter or digit from file name to be used in a smart rule. I put some files into a specific folder. Then I have a Smart Rule that distributes those files into indexed folders. Sometimes the file name contains characters that are macOS approved but not Windows.

Thanks.

If you put this handler inside a Script Library …

-- Replace non-alphanumeric characters with delimiter

use AppleScript version "2.4"
use framework "Foundation"
use scripting additions

set theFilename to "_Bla - Blubb? 0123 a"
set theDelimiter to " "
set theFilename_sanitized to my sanitizeText(theFilename, theDelimiter)

on sanitizeText(theText, theDelimiter)
	try
		set theString to current application's NSString's stringWithString:theText
		set theCharacterSet to current application's NSCharacterSet's alphanumericCharacterSet()
		set theComponents to theString's componentsSeparatedByCharactersInSet:(theCharacterSet's invertedSet())
		
		set newText_list to {}
		repeat with thisComponent in theComponents
			set thisComponent to thisComponent as string
			if thisComponent ≠ "" then
				set end of newText_list to thisComponent
			end if
		end repeat
		
		set d to AppleScript's text item delimiters
		set AppleScript's text item delimiters to theDelimiter
		set theText_sanitized to newText_list as text
		set AppleScript's text item delimiters to d
		return theText_sanitized
		
	on error error_message number error_number
		activate
		if the error_number is not -128 then display alert "Error: Handler \"sanitizeText\"" message error_message as warning
		error number -128
	end try
end sanitizeText

… and call it in your Smart Rule like

set theFilename_sanitized to script "Your Script Library Name"'s sanitizeText(theFilename, theDelimiter)

it should do what you want.

To create a Script Library:

4 Likes

Wow, impressive instructions but… it is not working. I’m really negate to make this things work.

My Smart Rule:

With the script:

The error:
Screenshot 2022-05-19 at 12.17.41

If I change theFilename by theRecord, then it complains about theDelimiter, and changing theDelimiter by " " it shows this error:
Screenshot 2022-05-19 at 12.13.21

I guess in this case it is taking the entire document, but not sure.

I’m going to wait for Pete to respond to your post; but I was surprised when I looked through the instructions yesterday to see that theFilename is set in the handler, rather than being defined as the record name somewhere along the way. In fact, I think this block

set theFilename to "_Bla - Blubb? 0123 a"
set theDelimiter to " "
set theFilename_sanitized to my sanitizeText(theFilename, theDelimiter)

probably shouldn’t be included in the script library. But again, I’m unsure why a script library is being used at all here, rather than just including the handler in the SmartRule script, so I’d need Pete’s feedback.

In any case, your script (used to call the handler) needs to define theFilename and theDelimiter to be able to pass those details to the handler.

set theFilename to name of theRecord
set theDelimiter to " "

should do it (again, I have not taken the time to look at the handler, so I’m not sure whether " " is the delimiter you need).

You will need to set the name of the record to theFilename_sanitized if you intend for the result of the handler to be used as the name of the record.

1 Like

Here’s a plain AppleScript version which is limited to Western languages but can be used in any script:

on sanitizeText(theText, theDelimiter)
	local theResult, theCharacter, wasAlphaNum
	set theResult to ""
	ignoring diacriticals and case
		set wasAlphaNum to false
		repeat with i from 1 to length of theText
			set theCharacter to character i of theText
			if "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789" contains theCharacter then
				set theResult to theResult & theCharacter
				set wasAlphaNum to true
			else if wasAlphaNum then
				set theResult to theResult & theDelimiter
				set wasAlphaNum to false
			end if
		end repeat
	end ignoring
	return theResult
end sanitizeText
4 Likes

Modified as shown, all worked fine:

on performSmartRule(theRecords)
	tell application id "DNtp"
		repeat with theRecord in theRecords
			set theFilename to name of theRecord
			set theDelimiter to " "
			set theFilename_sanitized to script "Sanitize Text"'s sanitizeText(theFilename, theDelimiter)
			set name of theRecord to theFilename_sanitized
		end repeat
	end tell
end performSmartRule

Now I’m going to try @cgrunenberg script as I can add some characters to the string like spaces.

1 Like

Ok, works as well replacing the script, but I cannot add ., or any other symbol as it is filtered as well.

No problem. Enough as it is.

Thanks to all!!!

Did you only change the script library without relaunching DEVONthink? Then it might be a caching issue. In addition, it’s actually not necessary anymore to use a library, you could add my sanitizeText function to the smart rule’s script code.

Sorry, I should have posted also a Smart Rule script.

-- Smart Rule - Replace non-alphanumeric characters with delimiter

on run
	tell application id "DNtp" to my performSmartRule(selection as list)
end run

on performSmartRule(theRecords)
	tell application id "DNtp"
		try
			set theDelimiter to " "
			
			repeat with thisRecord in theRecords
				set thisRecord_NameWithoutExtension to name without extension of thisRecord
				set thisRecord_NameWithoutExtension_sanitized to script "Sanitize Text"'s sanitizeText(thisRecord_NameWithoutExtension, theDelimiter)
				set name of thisRecord to thisRecord_NameWithoutExtension_sanitized
			end repeat
			
		on error error_message number error_number
			if the error_number is not -128 then display alert "DEVONthink" message error_message as warning
			return
		end try
	end tell
end performSmartRule

1 Like

Yes, I really messed up the instructions. I thought it’s clear that only the handler (i.e. on sanitizeText thru end sanitizeText) goes into the script library. However, it shouldn’t make a difference whether the 3 lines outside the actual handler are included or not, as calling a script library doesn’t run all the code in it. It only runs the code that’s inside the handler that’s called.

Tried that but it failed. Still not sure why ASObjC sometimes seems to work in a Smart Rule script but most often doesn’t over here. It would of course be a lot easier for everyone if it could simply be included in Smart Rule scripts.

1 Like

Thanks for that - obvious now I know :see_no_evil:

Ah - thanks for the explanation

It would of course be a lot easier for everyone

Who’s everyone, @pete31 :stuck_out_tongue: :wink:

1 Like

Me Myself And I :wink:

Oh… those three! :roll_eyes:
:slight_smile:

1 Like

That’s actually why I wrote the plain AppleScript version of the function to show that frameworks are rarely necessary.

2 Likes

I’m 3 as well: I’m Dumbest in “Dumb and Dumber” film. :woozy_face:

This is my new script:

I get this error on execution:

20/5/22, 10:05:41: Organizar Scraps	on performSmartRule (CanÕt continue sanitizeText.)

Could you please post the code instead of a screenshot? Thanks!

Sorry (now you know why I’m Dumbest :sweat_smile: )

on sanitizeText(theText, theDelimiter)
	local theResult, theCharacter, wasAlphaNum
	set theResult to ""
	ignoring diacriticals and case
		set wasAlphaNum to false
		repeat with i from 1 to length of theText
			set theCharacter to character i of theText
			if "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789" contains theCharacter then
				set theResult to theResult & theCharacter
				set wasAlphaNum to true
			else if wasAlphaNum then
				set theResult to theResult & theDelimiter
				set wasAlphaNum to false
			end if
		end repeat
	end ignoring
	return theResult
end sanitizeText

on performSmartRule(theRecords)
	tell application id "DNtp"
		repeat with theRecord in theRecords
			set theFilename to name of theRecord
			set theDelimiter to " "
			set theFilename_sanitized to sanitizeText(theFilename, theDelimiter)
			set name of theRecord to theFilename_sanitized
		end repeat
	end tell
end performSmartRule


This should be…

set theFilename_sanitized to my sanitizeText(theFilename, theDelimiter)

1 Like

“my” error. Zillion Zanks, Christian.