I was not aware of the dependency on bash. Good to know.
It is Sonoma but I do indeed use bash as a shell.
The reason I have done this has to do with the fact that I often have HTML specifying CSS and such before the markdown line 1st line. In my case adding it to the first line would not work and using my approach felt more bullet-proof.
No worries! It’s just good to know and declare dependencies to public code… or avoid the dependencies altogether (which is our preference whenever possible)
Given I was rightfully told that the method was not portable I thought it would be good to follow up with a regex solution that should be portable using python to do the matching.
Note that python
does not understand POSIX
regular expressions so the regex needs to be changed:
on getMatch(s, regex)
local theScript, theCommand
set theScript to "import re\\nif result := re.search(pattern=r" & quoted form of regex & ¬
", string=" & quoted form of s & ¬
"):\\n\\tprint(\\\\\\\"\\\\\\n\\\\\\\".join(list(" & ¬
"map(lambda x: result.group(x), range(0, len(result.groups()) + 1)))))\\n"
set theCommand to "export LANG='" & user locale of (system info) & ¬
".UTF-8';" & "/usr/bin/python3 -c \"exec(\\\"" & theScript & "\\\")\""
tell me to do shell script theCommand
return paragraphs of result
end getMatch
# Example of using getMatch()
set testLine to "# My testline"
log getMatch(testLine, "^#\\s(.+)$")
I didn’t check if python3
comes only with XCode as this article asserts. The main issue here is that AppleScript does not support Regular Expressions. And trying to use them nevertheless with AS will always be a bit weird/convoluted.
But what’s wrong with employing the tools that are certain to come with macOS anyway, like sed
or egrep
? For example,
set theScript to "egrep '#\s+\S.*$' " & quoted form of s
finds a 1st level headline just fine in a MD document. And it is a lot less complicated. (What happened to “readability”, btw? )
If you want to have only the text of the headline returned, you could use sed
:
set theScript to "sed -nE -e 's/^#[[:space:]]+([^[:space:]].*)$/\1/p' " & quoted form of s
In both cases (egrep
and sed
), additional calisthenics using quoted form of
might be needed. But there’s no need to use either Python and all the funny back-leaning slashes nor shopt
– the good old Unix tools suffice, even in their slightly mutilated variants delivered by Apple.
Aside: Now, your RE is too strict. ^#\s(.+)$
will match # two spaces
and put two spaces
in the capturing group.
The REs I suggested above match a leading ‘#’ followed by at least one space (more are possible) and then at least one none-space character (\S
or [^[:space:]]
);
But frankly – why make AS jump through all these loops instead of just using Python or JavaScript in the first place?
It never really was a good idea to use built-in Pytho with macOS if using the very valuable add-in libraries, anyway.
Python [probably] available forever at
Also recommended to use Anaconda when using Python.
Possible. I did not install it purposely.
egrep
doesn’t allow me to get the value of the capture group. With some additional scripting sed
although awkward could be used.
It becomes more difficult to put this generically in a method with an unknown number of capture groups.
I don’t think this is correct. Although I should have written it as "^#\\s*(.+)$"
to allow multiple spaces, the purpose of the capture group is to return the value after the #
and spaces, i.e. the value 'My testline'
.
The goal is to capture at least one character that is not a space until the end of the line.
I think it is.
- ^# matches a # sign at the start of the line
- \s matches a space
- . matches anything, including a space
- + says to have at least one of the preceding characters, i.e. “one of anything, including a space”.
That’s not at all the same as
as you’d also capture the line '# ', i.e. a # sign followed by two spaces and nothing else. Check it out.
^#\s*(.*)
fed with #headline
will put ‘headline’ in the capturing group.
^#\s+(.*)
fed with # headline
(two spaces between ‘#’ and ‘h’ will put ’ headline’ (note the leading space!).
Using the *
(any number of spaces) is too lax. As is +
(at least one space) in this case. You want exactly one space directly followed by something not a space. Like
^#\s(\S.*)$
(in customary RE syntax).
That is:
- a # sign at the start of the line
- followed by exactly one space (
\s
) - followed by a non-space character (
\S
) - followed by anything at all up to the end of the line.
The non-space character and everything following it will be put in the capturing group.
With REs, it is important to not only test the success but also the failure: Make sure that your RE fails with a non-matching input, not only that it succeeds with the matching input.
Yes, my first example was not correct. This, however, is:
set s to "# headline
second line of text"
set theScript to "echo " & quoted form of s & " | sed -nE -e 's/^#[[:space:]]+([^[:space:]].*)$/\\1/p' "
set theResult to do shell script theScript
And yes, more than one capturing group might cause more work. Your Python example returns "# My testline My testline"
here.
The question remains why one would want to make AS make jump through all these loops instead of using Python or JavaScript in the first place. And before I forget: You can, of course, use the ObjC-bridge from AS to employ NSRegularExpression
. Not exactly the pinnacle of readability, but possible.
Omg, this works beautifully and exactly as I wished! Thanks for a wonderful script! This is one of the reasons DT is so awesome – a helpful community that helps with the extensive scripting and tinkering capabilities of the app! Since DT I’ve picked up Markdown, (some) AppleScript and now considering other scripting options (JXA et al). It’s a journey – not only in terms of scripting but also getting perspectives on the contemporary app economy (using flashy limited functionality tech to lock-in user’s and their data on rigid pay-for-use web app based systems – DT is a soothing antidote to all of this)
The zsh shell offers its own version, suspiciously named, zshoptions
.