How to put the text of an article of a URL in a variable?

Hi, I’m trying to get the text of an article from a URL, so put that text in a variable. The code is below:

(() => {
	
	
	const DTapp = Application("DEVONthink 3");
  	DTapp.includeStandardAdditions = true;
  		
  	const selection = DTapp.selectedRecords();	
  	selection.forEach(r => {

        let theURL = r.url;
        let theMarkup = DTapp.downloadMarkupFrom(theURL);
        let theText = DTapp.getRichTextOf(theMarkup);
		 
	});
	
})();

But I get the following error in the Script Editor:

What am I doing wrong?

You’re using the object reference url instead of the object url().

And you shouldn’t use let for values that you don’t modify. Use const for them instead.

Prefixing variables with the is not helpful. Every noun is a the in English. Three senseless characters you have to type and read all the time.

Apart from what @chrillek has said, You’re attempting to put rich text (RTF format) into a JavaScript variable. This would probably not produce the desired results. A better approach is to create a bookmark of the URL and convert it to RTF.

I tried it with an URL and just got a normal string with some special characters (but all just plain text). That, as well as putting RTF (which is just text, too) into a JavaScript variable doesn’t do any harm.

Alternatively, there is getTextFrom() which should get just the text from the HTML.

1 Like

Yeah, it’s not going to produce an error. The OP would be disappointed though if they intend to preserve some formatting of the article using getRichTextOf().

I would ask why an automation is being used versus e.g., using the browser extension or services.

If someone wants to process several records, the automation might be more convenient than leaving them one by one in the browser and saving the currents via the extension or services.

1 Like

Another use case would be if you want to automatically retrieve contents of a specific website every day. So for example, you could set up an automation that on a daily basis, retrieves stock prices or current new stories or weather or whatever else might be of interest.

Moreover retrieving this as text rather than just bookmarking the website is helpful because you could extend the script to perform alert actions based on the content. For example, you could set it up to send yourself an email or a text anytime a significant event happened in the stock market or with the weather.

And that’s why I’m asking the reason for it. Even to this day, despite copious forms of documentation, there are people who don’t know about the services or browser extension.

Thank you, solved!

I had tried with getTextFrom(), then I changed to getRichTextOf(), I forgot to go back to getTextFrom() :man_facepalming:

Actually, I just need the text, I’m trying to send it to openai api

I’m trying to send the text of many bookmarks to the openai api and make a summary, then save the api answer as markdown

It’s okay now, I can get the text and send it, but stay forever “Running” in Script Editor

Here is the full code:

(() => {
	"use strict"
	
	const DTapp = Application("DEVONthink 3");
  	DTapp.includeStandardAdditions = true;

    const app = Application.currentApplication();
	app.includeStandardAdditions = true;
  	
	const userInput = "You are a researcher, make a summary with the first subtitle pointing out the 5 main points addressed, in the second subtitle explain the 5 main terms used, in the third subtitle explain 3 complex concepts in simple terms and in the fourth subtitle give an example in everyday life of how the subject can be applied. The text is as follows: ";
  	const my_api_key = "sk-projxxxxxx";

	
  	let selection = DTapp.selectedRecords();	
  	selection.forEach(r => {
        let aURL = r.url;
		let urlObject = r.url()
		let markup = DTapp.downloadMarkupFrom(urlObject);
		let text = DTapp.getTextOf(markup);
		
		let name = r.name;
		let thumb = r.thumbnail;
		let UUID = r.uuid;
		
		let prompt = userInput + text;
	
		let command = `curl https://api.openai.com/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer ${my_api_key}" \
-d '{"model": "gpt-4o-mini", "messages": [{"role": "assistant", "content": "You are a helpful assistant"}, {"role": "user", "content": "${prompt}"}]}'`
				
		let jsonObj = app.doShellScript(command);
		let obj = JSON.parse(jsonObj);
		let catalog = obj.choices[0].message.content;
	
		
		let title = name;
		
		let newSum = DTapp.createRecordWith({
			name: title,
			URL: aURL,
			type: "markdown",
			content: catalog,
			thumbnail: thumb
		});
				 
	});
	

 
})();

Test this line in a separate script file. It’s not going to turn up what you expect it would. You should call a method (e.g. r.name()) in order to retrieve information from r. Same for other lines.

1 Like

After that is fixed I get this error:

image

That error is probably due to an empty, incomplete or malformed JSON response. The author should program defensively, using something like

const obj = JSON.parse(jsonObj);
let catalog;
if (obj && obj.choices && obj.choices.length > 0 && obj.choices[0].message) {
  catalog = obj.choices[0].message.content;
}
if (catalog) {
  // process catalog
}

And they should, of course, use const whenever a variable is not going to change (instead of let).

It looks like obj.choices had produced undefined. That means the JSON data contained in the variable obj does not include the key choices. You might want to find out what actually is in the JSON for debugging purposes.

1 Like

How do you do that in JXA (or for that matter in Applescript)?

Debugging code such as:

DTapp.displayAlert(obj);

Results in an error of “Cannot convert types”

If you do not know the format of the JSON then how can you properly form a request to address its components?

return objectName; is a simple way to show what is in objectName, which will be shown in the Result pane of Script Editor. You can return basically any object you’d like to inspect. MDN reference

This will work only if obj is text. The displayAlert method is for telling you that something has happened (when you run the script as e.g. a menubar script within the DT app), not exactly for debugging.

Thanks - that gives me this:

When I clear off the popup the menu optoins at the bottom of Script Editor are gone:

image

It does something similar with every website I try.

The error messages seem to indicate that the curl command was not properly quoted/escaped (an error message as text would be easier to read, btw).
This is the curl string in the script:

`curl https://api.openai.com/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer ${my_api_key}" \
-d '{"model": "gpt-4o-mini", "messages": [{"role": "assistant", "content": "You are a helpful assistant"}, {"role": "user", "content": "${prompt}"}]}'`

A first, easy step, might be to examine that with

console.log(`curl https://api.openai.com/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer ${my_api_key}" \
-d '{"model": "gpt-4o-mini", "messages": [{"role": "assistant", "content": "You are a helpful assistant"}, {"role": "user", "content": "${prompt}"}]}'`);

I guess (!) that prompt contains quotes (double or single) that throw off the whole thing.