Just wanted to second everyone who is interested in Claude/ ChatGPT etc integration with AI, but for me it is crucial:
to not share my data with companies and
to have all my questions remain private
I would like to be able to ask questions of my data, but I also want to ask broad questions and be provided links to relevant information from my databases. Thank you to the team for considering it. Fingers crossed it will happen.
The request is noted but using public LLMs like ChatGPT, Gemini, etc. operate on their own with their own definitions of “privacy”. So you use them at your own risk. That being said, we will do what we are able to do but those things are more limited once you’re asking questions of online agents.
It does not have to be ChatGPT, Gemini, Claude, etc. It is probably better if it is not. If the AI as private as DT3 has always been, then it is on my ideal wishlist.
There exists the possibility of running a local LLM, e.g., in an application called Ollama. This would be a more private option and something investigating. However, performance is very dependent on the hardware with these local options. Also, storage has to be considered as a small model may be a 4GB+ download.
I am interested in Ollama and whatever the other options are. I am tempted to experiment with them now, but I am such a luddite and do not have the time. Could the potential storage problem be solved with an external hard drive? Or both the DT3 database and Ollama (or an alternative) need to be on the hard drive?
This morning I used Mixtral from DuckDuck Go to ask more about it. Most of it went over my head, but I was left with the impression that a cloud server is a must. I somehow do not think that a cloud server seems like a vulnerability.
This is the beginning of the answer it gave me:
Sure! I’ll provide you with a high-level overview of the steps required to set up and self-host an AI language model on your local machine or a cloud server. For this example, I will use Hugging Face’s Transformers library.
Set up your local machine or cloud server:
Local machine: Ensure that your computer has sufficient computational power and resources to run the AI language model. Install the necessary software, such as Python, pip, and Git.
Cloud server: Choose a cloud provider (e.g., AWS, Google Cloud, Azure, or DigitalOcean) and create a virtual machine (VM) with the required specifications. Install the necessary software on the VM.
That entirely depends on your data. A database of technical papers is likely already available through other sources. A database of personally identifiable medical information, not so much.
May be it is fine in this situation. But there are many instances in which I would not want to share the data or my prompts. I would not even want to share the questions I am asking of the papers that are freely available on Google Scholar or similar places.
Some people have been able to run local self-hosted non-toys models, but minimum requirements are about 256 GB of RAM, some TB of disk and so on. I mean, you need that for serious work, not tiny toy models that hallucinate more than say, or able to have “real” intelligence. IMHO, anything lower than 400b is a toy.
If you are running on a Mac, as I guess you must be if you using DEVONthink, then using Apple Intelligence allows you control of the visibility of your data. So you can remain in control. If you stray outside of these controls then it is at your own risk.
This afternoon, I was working on a research paper, and, for the life of me, I could not nail the precise keywords necessary to find what I was looking for in a dissertation within devonthink. Finally, I put it in NotebookLM, asked for the general idea, and found what I was looking for in less than a minute. I would love to have the option to pick between a LocalLLaMA or API key integration, if possible, within DT someday. Obviously local solutions are rather GPU intensive, however.
Note: Elephas’ proposed method of going into the database’s internals is not something discussed with us so going into it is a risk you take on your own. Just something to be aware of.
Yes. It ingests only. It doesn’t modify. It takes the text and puts them into a vector. The only question I have, is does the db place of the files in DT change? If they don’t change after creation then that would be great since Elephas wouldn’t have to keep reindexing. I’m not sure if the files in DT change location, can you help clear this up?
I’ve also been telling the developer to talk to you guys. These two products are amazing together and it will solve a lot of the AI problems people keep mentioning.
Please don’t change DEVONthink. It’s amazing as is!
Docling is great! It creates nicely formatted markdown files from PDFs, and other file types. I’ve only done PDFs but it keeps the structure intact. I don’t care about images so I’m not sure how that would impact you, but definitely give it a try.