A Proposal for the Integration of DEVONthink and ChatGPT API

cgrunenberg · May 12, 2023, 8:42am

Yesterday I tried to scrape amount, date, company & summary from poor scanned invoices via ChatGPT. The result looked awesome first until I noticed that almost all the facts were pure hallucinations.

That’s of course not surprising because the only recognizable information in the really bad text layer of the PDF was actually the company (and that was the one & only valid result) and the amount (which DEVONthink on its own could successfully retrieve and way faster).

cgrunenberg · May 12, 2023, 8:43am

Isn’t that what the Search inspector does?

rkaplan · May 12, 2023, 8:47am

Just like there are techniques to use in a standard programming language, so too there are ways to devise a prompt to minimize this. Ironically the techniques that work with prompting are often very much counter-intuitive the more experienced one is with conventional programming languages. Hence the very odd situation where I am giving you advice on how to instruct a computer.

It might be interesting for you to re-try your attempts by adding specific instructions in the prompt along the lines of “If you do not have enough information, leave the answer blank. Do not create or invent answers.”

Also if you have a paid OpenAI account you are likely to find that GPT-4 is notably less likely to hallucinate than GPT-3. With a free OpenAI account, if you go to the Playground at https://platform.openai.com/playground instead of ChatGPT and reduce the Temperature setting you will also find it is notably less likely to hallucinate; the closer the Temperature gets to 0 the more predictable/determinative the algorithms are.

cgrunenberg · May 12, 2023, 8:54am

Actually I already tried different temperature settings but lower values increased also the likelihood that you don’t get any results at all (even if there should be some). Like any other threshold it depends on the user’s expectations and the data what’s preferable or not.

And yes, I’m currently testing the free account as that’s probably what most of our users (and especially the trial users) would initially use too until they’re convinced that a premium account would be worth the money.

rkaplan · May 12, 2023, 8:57am

Yes - and perhaps I was being too simplistic in my example.

One of the truly helpful capabilities of GPT-4 at present its its ability to define questions using natural language. So I can “search” for a “Imaging Studies by Date” and get an HTML table or CSV or JSON listing that includes CT, MRI, X-Ray, Ultrasound, and other imaging studies along with the date and location within the document. That of course goes way beyond what the Search inspector does and would be an absolute killer feature in DT3 if it could be implemented.

The catch though at present is is that we are limited by the “context window” or maximum document size. Langchain gets around that to some extent to allow analyzing longer documents but at least in my uses so far it is not fast/efficient enough to use it in a practical workflow situation. Maybe we have to wait for the AI context window to grow, or maybe someone smarter than I am can figure out a more efficient algorithm to execute a prompt on a large document.

rkaplan · May 12, 2023, 9:00am

Fair enough. If by chance the documents are not confidential or you can put together a non-confidential example then I would be glad to try it for you on GPT-4 to show you the difference.

jsn · May 14, 2023, 11:49am

Does anybody know of other companies that have a similar product to DEVONthink that are interested in offering their customers a custom ChatGPT style interface for working inside their applications?

I continue to search for something that will help me correct about 1200 video transcripts and refine the information contained within them into a personal knowledge base that will let me extract the corrected transcripts (eventually) into some useful information.

Here’s a GPT4All open source project that allows us to load our own data that looks interesting. Ideally the system would even go so far as to be able to create video compilations of the timings of all the text found to be relevant to the transcript and let the user play back those clips while being able to select and remove clips that do not match, then rearrange them all chronologically so they cover a certain topic over the years of video recordings.

Ricky_kearns · May 14, 2023, 5:06pm

I am interested in your video transcript project. How did you download them? I would be interested in the process. There is a body of Youtube videos that I would like to get a transcript from and index them in DT3 in a way that I can go to the exact spot where the video is being referenced from the transcript.

chrillek · May 16, 2023, 8:39pm

The Verge had an interesting conversation with Google’s Bard:

That’s again not to say that these programs are useless. They are just not intelligent.

dansroka · May 18, 2023, 4:02pm

I have another example of this lack of any “intelligence”. I used ChatGPT to create a prompt to generate lists of synonyms. But I asked it to limit its suggestions to words that were two syllables or greater. It said “no problem”, but then could not do it. It would give me single-syllable words. If I pointed this out, it would agree and apologize… but then still be unable to avoid single-syllable words.

The problem with Large Language Models is that they are good at mimicking language, which we easily interpret as knowledge. But there is no intelligence there – no ability to reason, process or compute. They are just a fancy mimic. I would not trust a LLM to be able to do any real work like summarize a text accurately.

jsn · October 17, 2023, 5:14am

I use Keyboard Maestro (KM).

I create a macro that is limited to terminal.
I tell it to trigger it when I type yt/

Then I copy the youtube url (channel, list of urls, playlist, etc.) to the clipboard.
I type yt/ and it expands the command as needed.

This will produce

a folder with the name of the channel
an mp4 file (you can change the formatting as you wish) ask GPT or bard how or check the docs on gitHub
a description file with the heading from youtube
a json file with the video information
a transcript if available

It works for twitter, facebook, youtube, others - basically any site that is showing you a video in streaming format.

yt-dlp -o ‘%(uploader)s (%(uploader_id)s)/%(upload_date>%Y-%m-%d %a %b)s %(timestamp>%Hh%Mm%Ss)s YT %(title).150s - (%(duration>%Hh%Mm%Ss)s) [%(resolution)s] [%(id)s].mp4’ --write-auto-sub --convert-subs=srt --write-description --write-info-json -f ‘bestvideo[ext=mp4]+bestaudio[ext=m4a]/mp4’ -U ‘%SystemClipboard%’

You can add other parameters as needed such as passwords or cookies.
For example, I use cook/ to expand this string:

allcry · May 5, 2024, 12:44pm

list of use possibilities for an AI in DT

Search
related
sort

new feature
4. Q and A

allcry · June 23, 2024, 8:28pm

If only LiquidText and Devonthink had a baby.

rkaplan · June 23, 2024, 8:32pm

Yes but it needs to be done well.

I am a big LiquidText fan so I was thrilled when they introduced its AI integration over a year ago. Unfortunately it is poorly done; it particularly does poorly with large PDF files, which is otherwise LiquidText’s greatest strength.

allcry · June 23, 2024, 8:57pm

I was asking about AI search here. I think people confuse AI with artificial intelligence. Whih is the company’s fault? These things are not really A.I. What they are, is predictive text on steroids. Called an LLM. And AI search is different to AI summarization. This is like a fuzzy search. Again on steroids. I truly hate when marketers get to name stuff.

If anyone has a video of DT3 being used for legal research can they please post here? The one below of Liquid text blew my mind away!

I think I will invest in Liquid text, not for the AI, but for the ease of gathering evidence. I would rather do this with DT3, as I just paid for it.

I cannot wait for DT to search better. But it does not have to be good it just has to save time. The DT3 search is not perfect. But it is better than nothing.

You should just be able to have a natural search.

I see similarities in DT3 and LiquidText locating similar documents based on a note. The DT3 interface is too rigid. The liquidtext one seems to be very good at getting to the details, and then collating them smoothly.

kewms · June 23, 2024, 9:57pm

I would advise subscribing to Liquid Text on a monthly basis until you’ve used it for a bit. I liked it enough to go beyond the free trial, but ran into its limitations pretty quickly. It’s a great concept, but in practice it just doesn’t quite work well enough. (For my use case, obviously. YMMV.)

rkaplan · June 23, 2024, 10:29pm

LiquidText by itself - without AI - is a stellar product. I recommend it immensely for anyone who needs to annotate and link PDF documents.

But their AI feature does not live up to the rest of their reputation. They admit it is a beta product - and even at that there seems to be minimal progress and minimal interest soliciting input from customers.

jsn · July 26, 2024, 1:11pm

With Llama being open source, perhaps we can get it integrated with DEVONthink?
Doesn’t this technology have as much importance as being able to convert image to text?
I still think so.

jsn · July 26, 2024, 1:14pm

Miro.com now has AI built in and their once Enterprise-only global search is now available on their entry level, making this a great tool for taking notes, storing PDFs with annotations, integration with Google Drive, and more.

They handle image annotation so much better than all the competitors because you paste an image and it’s high resolution (unlike most tools that scale down the resolution). So, you can zoom, make notes, and export to PDF, or ask the AI chat to summarize your notes, or do calculations, measurements, etc.

BLUEFROG · July 26, 2024, 1:39pm

The answer to this is purely subjective. Answering from my own perspective, no it’s not. Not even close.

… at a second glance, my answer is unequivocally no in the comparison you made. OCR is not a perfect technology. There is no 100% accurate solution for every document. Would I rather have OCR improvements leading closer to 100% OCR solution versus getting an artifically generated paragraph of text about a PDF? 100%. Also, RAG processes with LLMs are going to be greatly hindered by a poor, or a lack of, text layer in a PDF. GIGO