OpenAI in DEVONthink?

EthicalEgret · November 8, 2023, 5:40am

Is the DThink team thinking about ways to bring OpenAI capabilities to DEVONthink? It would be very helpful for my research tasks if I could give prompts to one of the new GPT assistants and have it call functions in DEVONthink. I’m using DThink for historical research and some tasks are monotonous and slow, so automation could help a lot there. Other tasks could be more in the vain of data mining and generating summaries.

The OpenAI keynote happened recently (Discourse won’t allow me to include the link):

cgrunenberg · November 8, 2023, 6:45am

We’re considering various options but no promises.

What kind of tasks? Did you have a look at the possibilities of smart rules and/or AppleScript/JXA?

rfog · November 8, 2023, 12:19pm

Until something reaches DT, I’m using below service. You upload a PDF and can ask as many questions as you want in the language you want (I upload texts in English, French and Spanish, but asks in Spanish and it answers in Spanish).

As a sample, I got incredibly surprised when I asked about Conway’s Life game in a Wolfram blog entry what hadn’t any reference to Conway, and the AI was able to relate my question with the document and explained to me how could be uses the Wolfram stuff to create a multi-dimensional Life game.

The service is: reeder.ai

EthicalEgret · November 8, 2023, 6:27pm

I’m unfamiliar with AppleScript/JXA. I would have to find out if these have the functionality I need for my tasks.

Currently I’m splitting a collection of 50 PDF’s into smaller PDF’s and then generating an annotation file for each one. It would be helpful to have an AI read the PDF’s and attempt to auto-generate the data I need for each resulting PDF, which would be used in the filename and the info in the annotation file.

BLUEFROG · November 8, 2023, 6:48pm

Auto-generate what data?

EthicalEgret · November 8, 2023, 8:06pm

Things that are found in the document, like date, author, publisher, title, etc.

BLUEFROG · November 8, 2023, 8:21pm

I would not suggest putting those bit of metadata in a document’s name. Tags, Finder Comments, or custom metadata when running the Pro or Server editions, would be more suitable.

EthicalEgret · November 8, 2023, 8:31pm

For naming format I’m using ‘Date, Publisher, Title’ or in the case of a private email (some of the documents are private emails), ‘Date, Sender to Receiver’. The other data would be used for populating the annotation file. These annotation files are for exporting reference citations to a reference manager.

cgrunenberg · November 9, 2023, 6:27am

Do you have a sample file that you could share?

EthicalEgret · November 9, 2023, 8:53am

Sure, but they’re not that remarkable. You can find all of the PDF’s I’m splitting here. Each PDF represents a box of papers from an old archive, so the types of documents in each PDF varies. Some are newspaper clippings, some are faxed letters and papers, others are printed email transcripts.

I had ChatGPT write a python script for PDF splitting by PDF sticky note annotations. I ran it successfully for the first time today, so now that I have it working, I have to go through all of the other PDF’s and add a sticky note annotation at every page I want to split on.

kewms · November 9, 2023, 5:08pm

DT can split a PDF based on chapters in the PDF outline or pages.

straylor · November 12, 2023, 9:05pm

I too would like to use some kind of AI plugin with DT. I’ve started using Claude to summarize a small collection of papers and am on the Beta waitlist for Google’s NotebookLM (Previously called Project Tailwind and note the reference to Steven Johnson in the clip). I’m eager to point AI tools to my curated DT archives to find hidden connections and possible insights I may not be able to find on my own when doing research and article writing.

EthicalEgret · November 13, 2023, 2:54am

Yes, I second this. I have the very same interest.

EthicalEgret · November 13, 2023, 3:03am

I just want to say that I think DT is in a really unique position in the market to potentially capitalize on this AI opportunity. I’m not aware of other tools which have the same kind of access and insight into personal databases like DT has. Imagine an AI that knows your entire database of PDF’s, videos, and audio files, and can interactively perform tasks like a writing assistant or a virtual collaborator. I really hope this will become reality soon. Please build this!

kewms · November 13, 2023, 3:22am

Be aware, though, that giving a third-party data center access to “your entire database of PDFs, videos, and audio files” has pretty significant privacy implications.

straylor · November 13, 2023, 4:07am

Agreed. Google’s NotebookLM suggests in this article:

“Privacy is a big concern when it comes to AI-driven tools. According to Google, your data is in safe hands. The company assures that any data collected from your NotebookLM interactions won’t be used to train new AI models or shared for other users to see. However, the data may be used to improve the existing features of Google NotebookLM.”

… and this piece about privacy using Claude.ai.

cgrunenberg · November 13, 2023, 8:34am

That’s of course what all cloud-based companies have to claim. But even if that’s true at the moment, the policy might change, hackers (or employees) might get access to the data or the company might cease to exist. Even worse if your data should be stored only in the cloud but at least that’s not the case here.

EthicalEgret · November 13, 2023, 8:41am

Practically every document I’m reviewing is already in the public domain, or in the case of books, most are probably already in Google Scholar.

kewms · November 13, 2023, 8:48am

True for your work, maybe, but not for everyone’s.

kewms · November 13, 2023, 8:52am

Also, if a company wishes to operate in a country, they have little choice but to follow the laws of that country. Any writer or researcher focusing on politically sensitive topics should consider whether a warrant served on a third party would expose information that might be dangerous to them or to their sources.