A Proposal for the Integration of DEVONthink and ChatGPT API

Hi, I’m working with LLM since their inception in 2018. For quite some I would back your sentiment 100%. There were no value in integrating them with DT mostly because they were super hard to use (required training data). This changed with GPT-3 zero-shot ability but it was still hard to use due to need for prompt engineering. But release of chatgpt fixed all bad sides.

But I agree that the use cases adopted by Notion AI or Craft are poor as they focus on content generation. In DT we have plenty of other tasks that can be automated using ChatGPT.

Consider this simple usecase. For every document that goes to Inbox we would fire chatgpt with the following prompt:

Update this json: {"file_name": "invoice.pdf", "tags": [] , "summary":null} with better file name, tags suggestion and summary. Output only json and nothing else.
Tags I have: Private, Business, Invoice / Bill, Small, Large, Law, Subscription, On time payment, …
Based on the following content of the document:

Here is what chat gpt returned for copied content of an invoice from apple:

{"file_name": "apple_subscription_invoice.pdf",
"tags": ["Private", "Business", "Invoice / Bill", "Subscription", "On time payment", "Small"],
"summary": "This is an invoice for an annual subscription of Craft - Docs and Notes Editor purchased through Apple App Store. The invoice was issued on 31 May 2022 with a total amount of 189.99 zł inclusive of VAT at 23%."}

We would then present some interface to accept chatgpt recommendations during review process.
It would be super useful if we would allow users to change such prompts to include examples, etc.

If you are worried about privacy etc, we are going to have commercial opensource models pretty soon that can run on consumer hardware. I was able to run aplaca 7B on my mac book air with super fast performance.

Let me know what you think and if there is any chance that you would reconsider the hard “no” for this technology. I would love to talk about other usecases like search, data extraction, summarisation of long documents etc. Think that we could easily create something like phind or bing chat but that uses our databases of documents instead of the internet. (if you haven’t yet tried working with how chat gpt works for search try using phind expert mode)

Here is the chatgpt conversation:

4 Likes

Thanks for the interesting information!

There isn’t a hard “no”.
There’s just not an "Of course! That’s the greatest idea ever!!! :wink:

Thank you for the interesting use case! We’re actually aware of the possibilities & risks and trying out various options but no promises.

2 Likes

Which data do you actually have in mind?

My 2c. But first, for context. I’m a long time DEVONthink user and fan. If I’m cranky on here about issues it’s just my personality; please know that I am frankly in awe of the scope and endurance of this project. Dt, Emacs, and Firefox are basically the only software I regularly use.

ALSO, I have a very strong personal and professional interest in what’s going on with AI, I spend a lot of my recreation time fiddling with downloading models from huggingface and chatting with OpenAI’s model both through the browser and via api.

ALL THAT SAID:

The situation here is fluid and moving quickly. We are at a point where it is probably trivial to set something up scripting wise for Dt to interact with ChatGPT. But it’s an open question what things are going to look like in 6 months, a year, etc. This tech may end up being locked down so that only big players like MSFT can incorporate it into their software. Or we may be looking at a world where good enough models are running on everyone’s box as part of the OS. Who knows. In the meantime, I would not advise anyone to build a business model around assumptions about what OpenAI’s api is going to look like pricing or access wise in 2024.

2 Likes

See from @thekok

Meanwhile, I’m enjoying using the SmartConnections chatgpt plugin for Obsidian:

2 Likes

I have not seen anyone suggest a business model that depends on OpenAI.

But pretty soon software that does not integrate with AI may be at a disadvantage. That is clearly the way the industry is moving at present.

The entire thread is about integrating their api with Dt, so I’m kind of confused by your comment. In any event, the way the industry is moving I think not having some kind of rushed to market integration with an LLM is going to be a positive differentiator soon.

I am very excited about the potential of these models. But the current context window size and per token pricing scheme puts a pretty hard limit on what is useful to do with the 10G+ buckets of pdf we’re all lugging around. And that’s what we’re all dreaming about. The “write a letter to mom” stuff is going to be baked into your favorite text editor by the end of the month, if it hasn’t been already.

A little confused about this Obsidian plugin. It uses the user’s api key, but advertises GPT-4. Surely that’s available only if the user has api access to 4? I’m still on the wait list for the api, and they appear to be rolling it out pretty slow; they put a cap on ChatGPT usage.

Not to curb your enthusiasm, but I seem to remember “the industry” moving towards Blockchain before. And towards XML. And towards XHTML. And to SOAP.
Just because something is talked about a lot does not mean that it is financially or technically viable.
That’s not to say that these text analyzing programs are not going to take off. But “industry” interest in them is not a strong argument for their eventual success, IMHO.

4 Likes

If I’m wrong, someone please correct me, but as I understand it the issue is the context window, which is currently vanishingly small for GPT3 and a bit bigger for GPT4 (the 32k version seems huge rn, but…) The way “chat with your pdf” and the Obsidian plugin linked about work is they put a larger text corpus in an indexed database, translate your chat prompt into search on the database, and then feed the search results only to the LLM, ie., much smaller snippets of text. For some use cases this could be sort of useful. But I currently cannot, for example, ask the LLM to interact with an entire legal opinion meaningfully. Much less the average user’s Dt database.

I think I’m the one with the curbed enthusiasm here, and hard agree.

That stuff was theorized by the computing media. But very few apps actually added those features.

With AI take a look at ProductHunt - the number of new and existing apps with AI features is stunning.

Yes that is an issue. Short-term you can work around that with recursive summarization. GPT-4 will also have a larger context window. And surely it will continue to grow.

No question a context window that can hold an entire PDF is the holy grail for many uses and that has not arrived yet. But still there are significant uses of AI at present.

If legal applications are of interest, certainly current AI is advanced enough that you are at a disadvantage against your opponents if you do not use AI in addition to standard legal research techniques. It’s not advanced enough at this point to replace Lexis and other traditional legal tools; but at the same time it clearly has capabilities right now which surpass anything else out there for legal research.

[For legal research BTW - Bing AI Chat is far superior to ChatGPT since it has access to the Internet and gives references. Perplexity.AI integrates ChatGPT with the internet and is another option. The soon to be widely released plugins to ChatGPT may turn out to be superior to either of those options.]

1 Like

Obviously false. Betting the farm on AI is of course a bad strategy. But software is going to be dinged for not simply having the option to interface with AI.

Like so much other technology, the tool itself is neither good nor bad; it’s a matter of how it is used. Give the user the credit and option to choose what is best for his use case.

1 Like

As I said, I am very interested in AI and am actively exploring its use. On the other hand, I don’t think my contention that the market is about to be flooded with poorly thought out, rushed integrations to the api is “obviously false.”

To the legal point, historically my main advantage against my adversaries in litigation has been the fact that I take the time to actually read and understand cases in their entirety and in context instead of just looking at whatever section of the opinion the search engine points me at, and let’s just say I don’t expect that to change any time soon as a result of AI uptake given current constraints. And currently, the problem with scaling the context window has been that the resource use scales exponentially, although I think there are interesting results out there that may change that soon. Re progressive summarization, I’m extremely dubious, although some of the stuff re AI assisted context input compression looks very interesting.

I like perplexity’s results a lot more than Bing, but my understanding is both use more or less the same OpenAI product? Bard is the differentiated product in search. I have access to the web plugin for ChatGPT as well and it is…slow. Even without search you can get some pretty impressive results, although the truth is what you are getting is basically a summary of blog posts on law firm websites as opposed to anything I would call actual legal research. And I am already seeing evidence that they are training the open access models not to answer legal questions, with I’m sure a “safety” justification but it’s difficult not to be aware that they are working on products to monetize the AI for legal research and don’t want the competition from an open model.

1 Like

Sure some of the AI integrations are likely of questionable use, but surely that is not a negative for the software - just don’t use it if the feature is not helpful to you. I do agree though that in the software market where AI is 99% of the purpose, e.g. AI “blog writers” to write 5,000 blog posts a day, many of those will fail - as they should.

I agree 10,000% about reading cases in context and in detail… FWIW, I am not an attorney but rather a subject expert in litigation. I find Bing and Perplexity (and some custom scripts I have written for my specific uses cases) to help me find medical or legal citations that I otherwise would not have found or would not have found as quickly. Yes, I still have to read them completely and sometimes they turn out to not be useful- just as with a Google, Lexis, or PubMed search.

I like the Perplexity.AI interface as well but it operates by feeding finite results of a Bing web search into ChatGPT. Bing AI Chat on the other hand has access to all of the Bing search database. So I find on obscure points Bing AI Chat tends to be more likely to find stuff that Perplexity.AI does not.

Yes it is all changing and there may turn out to be business reasons to stratify access to data. Though it may become more costly, from a business/professional perspective if someone can make professional search data more easily/accurately available then I would be glad to pay that cost; the ROI no doubt will be worthwhile.

Interesting re the distinction between Perplexity and Bing, I was wondering why they give such different results. Can you say more about what you are using this for? Do you work as a retained expert in litigation matters relating to e.g. some sort of engineering specialty?

I am a retained expert in medicine related to future treatment needs of individuals involved in catastrophic injuries (“life care planning”) and related to estimation of life expectancy.