Apple Script, url encoding, open a url in Chrome

I use a service called SimpRead to read and annotate local HTML files in DEVONthink, generate permanent links, and save them to my notes. After configuring, I can view the HTML list in DEVONthink by opening the URL http://127.0.0.1:7026/static. Therefore, I want to write a Smart Rule to make it easier to directly open the selected HTML files in DEVONthink with SimpRead. I wrote a Smart Rule that matches all HTML files in the database and uses the following Apple Script code:

on urlencode(str)
	local str
	try
		return (do shell script "/bin/echo " & quoted form of str & ¬
			" | perl -MURI::Escape -lne 'print uri_escape($_)'")
	on error eMsg number eNum
		error "Can't urlEncode: " & eMsg number eNum
	end try
end urlencode

on performSmartRule(theRecords)
	tell application id "DNtp"
		set theSelection to the selection
		repeat with theRecord in theSelection
			set recPath to (path of theRecord)
			set pattern to "\\.\\/html\\/(\\d+)\\/(.+\\.html)"
			set newString to do shell script "echo " & quoted form of recPath & " | sed 's/" & pattern & "/\\1%2F\\2/'"
			
			tell application "Google Chrome"
				activate
				open location "http://127.0.0.1:7026/static/" & urlencode(newString)
			end tell
		end repeat
	end tell
end performSmartRule

However, this code does not run successfully, which is why I am seeking help here. In fact, I know very little about the code, and I obtained it by generating it through chatGPT, searching for posts on the DEVONtechonologies forum, and modifying it myself. I do not know how to debug in DEVONthink, and due to my limited coding ability, I have not been able to identify any syntax errors. Therefore, I am almost unable to make any modifications to the code to make it work properly. I would greatly appreciate any help I can get.

There are different approaches to solve your problem. Mine would be to start learning about these things and thus getting an understanding of the code. Which ChatGPT apparently has not.
A good starting point would be the fine DT manual. It has a section titled „Automation“ where you’ll find information on scripts, also on scripts for smart rules.

1 Like

Another issue is that the script processes the selection but should use the theRecords parameter instead. Smart rules should never work on the selection, otherwise the results might not be the expected ones.

Thank you for your suggestion! Actually, I learn a little bit of JavaScript, yet not that much, and just the basic syntax…

Do you mean I should replace the selection with theRecords, like this:

on urlencode(str)
	local str
	try
		return (do shell script "/bin/echo " & quoted form of str & ¬
			" | perl -MURI::Escape -lne 'print uri_escape($_)'")
	on error eMsg number eNum
		error "Can't urlEncode: " & eMsg number eNum
	end try
end urlencode

on performSmartRule(theRecords)
	tell application id "DNtp"
		repeat with theRecord in the records
			set recPath to (path of theRecord)
			set pattern to "\\.\\/html\\/(\\d+)\\/(.+\\.html)"
			set newString to do shell script "echo " & quoted form of recPath & " | sed 's/" & pattern & "/\\1%2F\\2/'"
			
			tell application "Google Chrome"
				activate
				open location "http://127.0.0.1:7026/static/" & urlencode(newString)
			end tell
		end repeat
	end tell
end performSmartRule

I have actually tried it out, which still didn’t work, either.

jsut have no idea what’s going wrong :rofl: forgive me for my illiteracy about programming :smiling_face_with_tear:

The selection thing was the obvious mistake. The next one is that the parameter of performSmartRule is theRecords and that you then use theRecord of the records. This line will probably not even survive the syntax check.

The rest is a lot less obvious, due to a lack of comments (can’t or won’t ChatGPT illustrate what it’s trying to do?) and explanation from your side.

Firstly, I see a regular expression pattern
\.html\/(\d+)\/\(.+\.html)
of which I have no idea what it is trying to achieve. It would match strings like “.html//.html” and convert them into “/%2F.html” Given that this is the path to a record, why are you so heavily modifying it, i.e. throwing away the first .html (and why would there even be two “.html” extensions in this path)?

Secondly, I’m not sure that I really understand how this Simple Read thingy works. It looks as if they’re running a local web server at port 7026 and which shows you something (but what exactly?) at the location static. Your script appends something to this URI, namely a heavily modified path to a DT record. Did you try this concept in your browser? Did it work? What do the URLs really look like?

My take on that, with a simple script, not one to be used in a smart rule (and in JavaScript, of course):

(() => {
  const app = Application("DEVONthink 3");
  const chrome = Application("Google Chrome");
  const rec = app.selectedRecords();
  if (rec.length === 0) return;
  rec.forEach(r => {
    const URL = `http://127.0.0.1:7026/static/${encodeURIComponent(r.path())}`;
    chrome.openLocation(URL);
  })
})()

Put that into script editor, set its language selector in the upper-left corner to “JavaScript”, select a record in DT and run the script in script editor. Its debug area will show you what’s happening.

Thank u so much for your detailed reply!
SimpRead is able to read HTML files from a local folder. I have set it to read from “~/Databases/Info.dtBase2/Files.noindex/html”, which is the path of the HTML folder in my primary database of DEVONthink. By opening the corresponding URL with SimpRead, you can generate a reading mode for the local HTML file, highlight text, and create a permanent deeplink for each highlighted annotation. You can also export it to note-taking software like Notion or Obsidian if needed. Once turning on the local server, the URL format that SimpRead reads for each HTML file in the folder is “http://127.0.0.1:7026/static/” + . For example, if I have an HTML file named “cubox_export” with a path of “./html/b/cubox_export.html”, the URL that SimpRead will use to open it is “http://127.0.0.1:7026/static/b%2Fcubox_export.html”.

If I didn’t get it wrong, the debug info is: Message not understood. still confused

I won’t have time to check that before afternoon CET. But I now understand what the regular expression tries to do: remove the path to the html document and leave only the file name itself. That might be achieved with something like this
r.path().replace(/^.*?html/,"")
To get more detail in the script editor, open the debug page by clicking on the three lines at the bottom of the window. You can then follow in detail what’s happening.

Welcome @uu2003

and I obtained it by generating it through chatGPT…

ChatGPT is like a child magician. Sometimes it will surprise you, but often you just politely clap while rolling your eyes. :roll_eyes::stuck_out_tongue:

Learning to script will benefit you more in the long run than trying to rely on such things as chatGPT.

4 Likes

Thank you for your explanation, I now see the debug info which goes like:

app = Application("DEVONthink 3")
app.selectedRecords()
--> [app.databases.byId(2).contents.byId(45126)]
app.databases.byId(2).contents.byId(45126).path()
--> "~/Databases/Info.dtBase2/Files.noindex/html/b/cubox_export.html"
app = Application("Google Chrome")
app.openLocation(["http://127.0.0.1:7026/static/%2FUsers%2Fyutang%2FDatabases%2FInfo.dtBase2%2FFiles.noindex%2Fhtml%2Fb%2Fcubox_export.html"])
--> Error -1708: Message not understood.
**Result:**
Error -1708: Message not understood.

I now understand that the path of a record obtained by the script is an absolute path, while the path shown in DEVONthink’s inspector is indeed a relative one, so the regular expression should be adjusted.

I adjust it to the following:

(() => {
  const app = Application("DEVONthink 3");
  const chrome = Application("Google Chrome");
  const rec = app.selectedRecords();
  if (rec.length === 0) return;
  rec.forEach(r => {
  	const query = r.path().replace(/^\/Users\/yutang\/Databases\/Info\.dtBase2\/Files\.noindex\/html\//,"");
    const URL = `http://127.0.0.1:7026/static/${encodeURIComponent(query)}`;
    chrome.openLocation(URL);
  })
})()

And now the debug info is as follows:

app = Application("DEVONthink 3")
app.selectedRecords()
--> [app.databases.byId(2).contents.byId(45126)]
app.databases.byId(2).contents.byId(45126).path()
--> "~/Databases/Info.dtBase2/Files.noindex/html/b/cubox_export.html"
app = Application("Google Chrome")
app.openLocation(["http://127.0.0.1:7026/static/b%2Fcubox_export.html"])
--> Error -1708: Message not understood.
**Result:**
Error -1708: Message not understood.

http://127.0.0.1:7026/static/b%2Fcubox_export.html is really a URL that I can open directly in Chrome, so what do I need to adjust next? I suppose it’s almost done!

Well, if you have a look at the scripting dictionary for Chrome, there is no openLocation method (yeah, I hadn’t checked either this morning, assuming someone would have done it. Stupid). Obviously, ChatGPT didn’t look either, but just stammered something™. Another good reason to just forget about this crap and learn to script if you need it. There’s no shortcut, except the one that leads you directly into a dead end.

Something like

  const chrome = Application("Google Chrome");
  const tab = chrome.windows[0].tabs[0];
  tab.url = ...

does the trick.

OTOH: The replace does probably work in this case, but it’s not what one would do when using a regular expression. Instead, (and I’d said that before, hadn’t I?):
/^.*?html

Why: Because your “RE” only works for one database, namely “Info”. It’ll fail for any other database, e.g. “Test” because then the base path would contain …Test.dtBase2…. And that’s why the regular expression is called “regular”: You use it to specify a pattern that is as specific as necessary and as broad as required. In this case, you’re looking for the beginning of the string (^), followed by any character (.) any number of times (*) followed by “html” literally. Now, this would gobble up the whole path because the filename ends with “.html”. That is called “greedy behavior” in a RE, and the question mark (?) turns it off. It basically says, “find as much as necessary followed by ‘html’ and then stop”. Which will terminate the search after the “html” in the path, just before the sub-folder “b”.

Aside: Script Editor is even worse than I thought. Instead of correctly displaying chrome its output when a method of this app is called, it uses app again, although that’s clearly wrong and refers to another Application object. Why can’t they even get the simple things right?

Thanks for your detailed instructions! I follow your script provided above and now it already works! Yet still a little problem, it would be even better if the URL can be opened in the active window as the newest tab, or it would now replace the first tab in the first window of Chrome. As for the programming skill, to be honest, I learn a little bit of the very basic syntax of JavaScript(text formatting, Regex, loops and iteration, etc.) yet still have very limited knowledge of it, so it could be quite hard for me to finish even such a stuff. Thank you for your patience and kind help. more than obliged!

Not to sound impolite, but that’s the point about learning: getting your feet wet. Yes, it takes time.
I got you started and provided you with some pointers. Now check out the scripting dictionary, my lovely site on JXA and use your favorite search engine – opening a new tab should be easy.

1 Like

That’s a good point. I directly made it for the “Info” database, I just didn’t care much about it because I have only two databases and all the html files are in the “Info” now. Of course I can specify the pattern t o match all the databases yet it works well for me now. Although I’m not that familiar with programming, I can still use the Regex quite freely in the general, though sometimes might seem kind of stupid.

It probably does. But it’s still not a good regular expression:

  • It’s useless for other people who look in the forum because it works only for you
  • It requires far too much typing because you have to escape all the slashes and dots
  • Which in turn makes reading and understanding the expression more complicated.
  • It distracts from the main point, namely, removing everything up to and including the first occurrence of “html”.
1 Like

Yeah, you have reason, I’ll take that next time :rofl: Another point is that English is actually not my first language, so it would be kind of complicated for me especially when it involves many technical terminologies. Anyway, thank you so much for your help!!! :smiling_face_with_three_hearts: