OpenAI Integration with DT3 through JXA Script

Below and attached are Version 1.0 of a JXA script integrating OpenAI with Devonthink.

As we have discussed extensively on the forum, AI has tremendous potential to help with document analysis that many of us are involved with. ChatGPT/OpenAI and this script are in early stages. Do not rely on the software without verifying the facts yourself. Above all do not upload any private/privileged documents to OpenAI.

That warning aside, I am quite intrigued by what OpenAI is presently capable of and I have found the script to be notably helpful in improving my workflow when I read and legal medical literature.

Of note:

(1) To use the script you need to edit it to add your OpenAI API key.

(2) As the script uses the Turbo 3.5 GPT model, costs are likely to be negligible for most users.

(3) There is a waiting list for the GPT 4 API. Once the script is eventually updated or edited for that, it should be able to handle longer documents and should operate much fasters, albeit at higher cost. I suspect for most users the Turbo 3.5 GPT model will be fine for the foreseeable future.

(4) The script lets you select a DT3 group and then recursively parses your nested tree of DT3 groups. It allows you to summarize documents based on a set of pre-defined prompts or one that you choose instead. It then outputs a report in the same group you initially chose. A sample report is below.

(5) As there are token size limits to GPT 3.5 usage, it cannot read an entire document. Instead it reads the first 4000 characters [a number you can change] or alternatively lets you start reading at a user-chosen Start Word. As I often read academic literature, I have set the default Start Word to Abstract.

Please submit any discuss any feedback, suggestions, bug reports, or revisions/improvements to the script. AI technology is moving quickly so it is likely that this sort of script will remain a moving target for quite a while.

Thanks to everyone who has been a source of education and inspiration on what can be accomplished with scripting - including but not limited to @chrillek, @cgrunenberg , @BLUEFROG , @ryanjamurphy , and @pete31 . My code no doubt is not pretty or efficient compared with any of theirs - but it works and I am progressing with JXA.


/*


DT3 Summary Script - by Richard S. Kaplan, M.D.  rkaplan@kaplanlifecareplan.com

This script will recursively parse nested DT3 Groups and summarize documents using a GPT-Turbo-3.5 and a Prompt of the user's selection

Due to GPT token size limits, the script starts at the user-selected "Start Word" and continues for CharCount more characters (default 4000)

Set your OpenAI API Key below

*/




(() => {
    'use strict';

   
	const a = Application.currentApplication();
    a.includeStandardAdditions = true;
	
	const a2 = Application("DEVONthink 3");
	a2.includeStandardAdditions = true;
	

const my_api_key='sk-XXXXXXXXXXXX';

const promptChoices=["Summarize at a High Level","Summarize in 1 Sentence","Summarize for a 5 Year Old","Summarize in Bullet Points","Critique and Rebut","Brief Summary and Brief Critique - Important: If you speculate without full objective data then say so","Other"];

const CharCount=4000;




const group=GetGroup();


const  prompt =GetPrompt();

const StartWord=GetStartWord(); //lower case only or blank to ignore

var eprompt="";

CreateSummaryFile(group,prompt);

a.displayAlert("AI Summary Complete",{message:"See Group "+group.name()});

return; 

function AIQuery(queryprompt) {

var answer;

if (queryprompt !=  "")

{

let shellscript = "curl https://api.openai.com/v1/chat/completions -H \x22Content-Type: application/json\x22 -H \x22Authorization: Bearer " + my_api_key + "\x22 -d \x27{ \x22model\x22: \x22gpt-3.5-turbo\x22, \x22messages\x22: [{\x22role\x22: \x22user\x22, \x22content\x22:\x22 "+escape(queryprompt)+"\x22}], \x22temperature\x22: 0.7 }\x27";
  

  
let openairesult= a.doShellScript(shellscript);


let parsedresult=JSON.parse(openairesult);


answer=parsedresult.choices[0].message.content;

answer = answer.replace(/(?:\r\n|\r|\n)/g, '<br>');



}

else {answer="Query Blank"};

return answer;
};


function GetPrompt()
{
var prompt;

prompt=a.chooseFromList(promptChoices,

{
	withPrompt: "Enter a prompt:",
	defaultItems: ["Other"]});


if (prompt=="Other") 


	{ var prompt2 = a.displayDialog("Enter Custom Prompt: ", { 
		defaultAnswer: "",
		withIcon: "note",
		buttons: ["Cancel", "Continue"],
		defaultButton: "Continue"
        });

	prompt=escape(prompt2.textReturned);
};

return prompt;
};


function GetStartWord()
{
    var sw=a.displayDialog("Enter Start Word (Enter = Default, Blank = Start at Beginning of Document): ",{
	defaultAnswer: "abstract",
	withIcon: "note",
	buttons: ["Cancel", "Continue"],
	defaultButton: "Continue"
    });
    return sw.textReturned.toLowerCase();

};

function GetGroup()
{
return a2.displayGroupSelector("Choose a Group for Summarization");
};

function CreateSummaryFile(sumgroup,sumprompt)
{


var uprompt=unescape(sumprompt);
const NewRecord=a2.createRecordWith ({"name":uprompt+' - '+sumgroup.name(),"type":"HTML"}, {in: sumgroup}  )

const documents=group.children;

NewRecord.source = "<center><font size=5><font color=blue><b>"+uprompt+' - '+sumgroup.name()+"</center></b></font><br><br><font size=3></font color=black>";

ParseDocumentTree(documents);

function ParseDocumentTree(documentset)

{

for (let i=0; i<(documentset.length); i++)

{

if (documentset[i].type()=='group') 
  {ParseDocumentTree(documentset[i].children)}

else

{

NewRecord.source = NewRecord.source() +'<b>' + documentset[i].name()+"</b><br>";

let plainT=""; // Text
let plainI=0; //  Index Counter to Text

if (documentset[i].url()) {NewRecord.source = NewRecord.source() + '<a href="'+documentset[i].url()+'">Internet Link to Document</a><br>'};  
 
NewRecord.source = NewRecord.source() + '<a href="'+documentset[i].referenceURL()+'">'+documentset[i].referenceURL()+'</a><br>';

plainT=documentset[i].plainText().toLowerCase();
plainI=plainT.indexOf(StartWord);
if (plainI==-1) {plainI=0};





eprompt = prompt + ': '+escape(plainT.substring(plainI,plainI+CharCount));




NewRecord.source = NewRecord.source() + AIQuery(eprompt)+'<br><hr><br><br>';

}

}
}

};

})();

DT3 AI Summary.zip (5.8 KB)

7 Likes

seems to be a very interesting and useful approach, will test it… if I understand it correctly, it can only asses 4000 chraracters of a pdf, which would mean only part of a scientific publication? Isn’t this a major limitation in your use-case?
Do you know this opensource project? It seems to allow to analyze whole pdfs (by splitting them) and promises less halluzination GitHub - bhaskatripathi/pdfGPT: PDF GPT allows you to chat with the contents of your PDF file by using GPT capabilities. The only open source solution to turn your pdf files in a chatbot!

Might be great to integrate this in the workflow, maybe even with the possibility to enter a prompt with questions or read out a text-file with a pre-defined prompt in a devonthink folder. Unfortunately I have no coding experience, therefore I can just provide ideas… I am aware of the limitations and also privacy issues of GPT, therefore for me it’s just trying out this new technology to get a feeling about it’s possibilities and limitations (without using it with sensitive files).

Greetings @vinschger

The pdfGPT project is interesting and worthwhile though has a different goal since its aim is to create a chatbox to ask questions about a PDF. No question that is a useful goal; my script on the other hand creates summary documents to give you an overview of a set of documents in DT3.

I wish the 4,000 character limit did not exist, but it is not as much of a limitation as it seems if the use case is to organize/summarize academic papers since that’s a pretty generous limit for the abstract of a paper. Certainly the abstract alone is not acceptable if the goal is to comprehend a subject and offer advice on it; but for purposes of searching academic literature 4,000 characters turns out to be an enormous step up from tagging or MESH classification or Dewey decimal system etc which are previous ways of searching for information.

Keeping to 4,000 characters and using GPT 3.5 turbo makes the cost minimal - so you can easily use this script to organize/search hundreds or maybe in the thousands of PDF files that you may have lying around your computer. On the other hand, running AI on the full text of your hard drive is likely to become cost-limiting enough that you choose to use it only selectively.

Obviously my script also works well for use cases other than academic documents with an abstract; if you try it on a set of random OCR’d documents of any sort you may be surprised how useful it is for search/overview purposes to summarize simply the first 4,000 words. Having such a summary in one document covering every document in your group/subgroup turns out to me at least to be much more useful than just a list of filenames.

For the more important cases where running AI on more than just 4,000 characters is needed, there is a waitlist currently for the GPT-4 API. I do plan to make that an option for the script when I come up on the waiting list, but if you compare the pricing on GPT-3.5 vs GPT-4 I think only special documents will justify the GPT-4 approach.

** Also - I find my script (with its 4000 character limit) to be particularly helpful in summarizing an RSS feed, whether on an academic topic or any other mainstream news or other subject. I find it quicker and more informative to view the summary list in that format than a list of headlines on which I have to click to get more details.

thanks so much for your detailed reply. sounds interesting! I am fascinated by your approach and script… please share further scripts here in case you should do more of them… :slight_smile: thank you!

I am pondering the question of how to summarize longer documents.

I believe it would be possible to have a script break up a large document into individual sections about 4,000 characters each and summarize each of those sections.

For some purposes that alone might be sufficient.

It would also be possible to then combine those summaries and repeat the process again.

Essentially a “recursive summary algorithm.”

It would be interesting to see how well that works.

My understanding is this is basically what langchain does. You aren’t limited to summarizing you can ask the AI questions about the document.

Interesting - you may be right

I briefly looked into it a while ago but it seemed pretty complex and to make all sorts of assumption about prior AI knowledge/terminology that I do not have.

I suspect I would prefer something more user-friendly rather than fiddlying with Python code and Jupyter notebooks k

https://python.langchain.com/en/latest/use_cases/summarization.html

I am morally certain there will be consumer products shortly. That said, the docs aren’t half bad, they basically walk you through setting up the notebooks, if you have any familiarity with Python I expect it’s within your ability if not necessarily your comfort zone. Regardless, I thought it was interesting that you were approaching their solution. What they actually do as I understand it is chunk the text, index it, and feed the AI relevant chunks based on searching the index. So a little more cumbersome than the progressive summarization you were talking about.

You might look at their docs for advice about how to automate cutting your documents down into digestible bits.

also, I think it’s 4000 tokens?

I do not know of a way to calculate tokens on the fly and thus do not know how to set my script’s input right at the OpenAI limit. So I went by characters - which are easy to count in Javascript - and 4,000 seemed like a good number by trial and error where it is not rejected. But it’s easy for you to change that to a different number in the script if you wish.

I saw some discussion on the OpenAI Forum about how to calculate tokens but it did not seem as if there is a clear-cut formula to use. Does anyone know of such a method?

There are libraries to do this programattically in Python (see e.g. GitHub - openai/tiktoken: tiktoken is a fast BPE tokeniser for use with OpenAI's models.) but in JXA I dunno. It’s not a simple formula.

1 Like

“This is the way.” I’m working hard yo bring all my documents in an AI system. The process is already done with my Obsidian archive and I’m having great and surprising results. But there it’s quite simple cause it’s A LOT smaller. The real huge challenge is to index all the documents I’ve saved and I save in Devonthink. Here the database is really, really big and the format is less homogeneous than in Obsidian. Anyway I understand that having all my databases in DT indexed to interact with an AI “engine” (right now I’m using OpenAI, but in the future I don’t know and the LLM could also be deployed in local) could be an incredible opportunity. I hope here the coding Team will incorporate true AI technologies as soon as possible. For now thanks @rkaplan for your fundamental proof of concept. It seems to me a really important indication for a promising new way.

1 Like

@rkaplan please contact me via email as soon as possible, I can’t send you a private message on this forum and I can’t find your email. my email is fabio () rapposelli dot org