I prefer to have my file names normalized to ‘pandoc-types-types-for-representing-a-structured-document’ rather than the raw ‘pandoc-types: Types for representing a structured document’, which can cause a lot of problems.
So I put together a little script that normalizes file names, while saving the original name into the Finder Comments field.
tell application id "DNtp"
repeat with thisRecord in (selection as list)
set theName to the name of thisRecord as text
set the comment of thisRecord to theName
set theNewName to do shell script "~/go/bin/sanitize " & "'" & theName & "'"
set the name of thisRecord to theNewName
end repeat
end tell
You can find the ‘sanitize’ command on my Github. It is written in Golang.
Out of curiosity: why do you only convert Ł/ł to their ASCII “equivalents”, not the other Polish diacritics like the n and e with cedilla? I suppose that nowadays all filesystems support Unicode, so why get rid of Ł/ł?
Not at all, thanks for the explanation. I have no knowledge of Go, so I thought that you were handling only the upper/lowercase Ł. I suppose that one can not combine L with any diacritical mark to get Ł, because there is no such mark (and please excuse me for using “cedilla” - I didn’t think of ogonek)
The other question however, remains: Why change these unicode characters to an ascii equivalent? Does it have something to do with Apple’s weird decision to use non-combining characters instead of the combining ones?
The idea was triggered by my obssesive-compulsive nature and a sense of aesthetics. I really hate those weird filenames.
Also, this ensures filenames are safe for any existing or future filesystem (sans filename length, which is easy to fix) — long-term archiving and portability.
The use of marking nonspacing characters is actually a feature of Unicode itself and as such is present in all filesystems supporting Unicode names, not just macOS.
Ah. In the eye of the beholder I actually like spaces and special characters in file names. Always found ASCII too limited.
I’m aware that combining characters are part of Unicode. But afaik only Apple chose those for its file names. Which leads to some interesting behavior if one tries to use such files in WordPress on a Linux system. Or: 28 years after Unicode, we still can’t handle accents: PDF + macOS + URL = chaos – The Eclectic Light Company
Quite entertaining.