DT Integration with AI

Anna_123 · December 10, 2024, 7:29pm

Just wanted to second everyone who is interested in Claude/ ChatGPT etc integration with AI, but for me it is crucial:

to not share my data with companies and
to have all my questions remain private

I would like to be able to ask questions of my data, but I also want to ask broad questions and be provided links to relevant information from my databases. Thank you to the team for considering it. Fingers crossed it will happen.

BLUEFROG · December 10, 2024, 7:41pm

The request is noted but using public LLMs like ChatGPT, Gemini, etc. operate on their own with their own definitions of “privacy”. So you use them at your own risk. That being said, we will do what we are able to do but those things are more limited once you’re asking questions of online agents.

Anna_123 · December 10, 2024, 7:44pm

It does not have to be ChatGPT, Gemini, Claude, etc. It is probably better if it is not. If the AI as private as DT3 has always been, then it is on my ideal wishlist.

BLUEFROG · December 10, 2024, 7:52pm

There exists the possibility of running a local LLM, e.g., in an application called Ollama. This would be a more private option and something investigating. However, performance is very dependent on the hardware with these local options. Also, storage has to be considered as a small model may be a 4GB+ download.

Anna_123 · December 10, 2024, 8:00pm

I am interested in Ollama and whatever the other options are. I am tempted to experiment with them now, but I am such a luddite and do not have the time. Could the potential storage problem be solved with an external hard drive? Or both the DT3 database and Ollama (or an alternative) need to be on the hard drive?

This morning I used Mixtral from DuckDuck Go to ask more about it. Most of it went over my head, but I was left with the impression that a cloud server is a must. I somehow do not think that a cloud server seems like a vulnerability.

This is the beginning of the answer it gave me:

Sure! I’ll provide you with a high-level overview of the steps required to set up and self-host an AI language model on your local machine or a cloud server. For this example, I will use Hugging Face’s Transformers library.

Set up your local machine or cloud server:

Local machine: Ensure that your computer has sufficient computational power and resources to run the AI language model. Install the necessary software, such as Python, pip, and Git.

Cloud server: Choose a cloud provider (e.g., AWS, Google Cloud, Azure, or DigitalOcean) and create a virtual machine (VM) with the required specifications. Install the necessary software on the VM.

Install the necessary libraries and dependencies:

kewms · December 10, 2024, 10:12pm

That entirely depends on your data. A database of technical papers is likely already available through other sources. A database of personally identifiable medical information, not so much.

Anna_123 · December 11, 2024, 1:02am

May be it is fine in this situation. But there are many instances in which I would not want to share the data or my prompts. I would not even want to share the questions I am asking of the papers that are freely available on Google Scholar or similar places.

rfog · December 12, 2024, 12:32pm

Some people have been able to run local self-hosted non-toys models, but minimum requirements are about 256 GB of RAM, some TB of disk and so on. I mean, you need that for serious work, not tiny toy models that hallucinate more than say, or able to have “real” intelligence. IMHO, anything lower than 400b is a toy.

cgrunenberg · December 12, 2024, 1:00pm

Llama 3.3 70b is not that bad. But the main issues of local models are usually…

tiny context window compared to OpenAI/Mistral (128k), Anthropic (200k) or Google (2m)
limited supported for multiple languages and/or multimodal input/output
no or unreliable support for tool/function calling
high RAM and disk space requirements
really poor performance

rkaplan · December 12, 2024, 1:39pm

I have yet to see a realistic use case for any local model except education of computer science students.

kewms · December 13, 2024, 5:05pm

We’re about to find out what local models can (or can’t) do. Apple Intelligence is very proud of their “local first” design.

SolarPlexus · December 14, 2024, 7:20pm

I am still experimenting with AnythingLLLM and Devonthink. Having everything local is pretty important to me.

davidrobertparker · December 15, 2024, 3:11pm

If you are running on a Mac, as I guess you must be if you using DEVONthink, then using Apple Intelligence allows you control of the visibility of your data. So you can remain in control. If you stray outside of these controls then it is at your own risk.

Mindstormer · December 16, 2024, 1:49am

This afternoon, I was working on a research paper, and, for the life of me, I could not nail the precise keywords necessary to find what I was looking for in a dissertation within devonthink. Finally, I put it in NotebookLM, asked for the general idea, and found what I was looking for in less than a minute. I would love to have the option to pick between a LocalLLaMA or API key integration, if possible, within DT someday. Obviously local solutions are rather GPU intensive, however.

cgrunenberg · December 16, 2024, 6:54am

Stay tuned

anonny · December 16, 2024, 9:01pm

I solved this problem using Elephas.

Also, I’m using Local LLM for the embedding and the LLM.

I just tried it with the new phi-4 last night and it’s great!!!

Ps.
You might want to use docling to get better results if your PDFs have poor quality text.

Abbyy is good, but loses structure of text.

BLUEFROG · December 16, 2024, 9:21pm

Note: Elephas’ proposed method of going into the database’s internals is not something discussed with us so going into it is a risk you take on your own. Just something to be aware of.

SlickSlack · December 16, 2024, 10:11pm

This is an intriguing statement from the Docling page. Curious to see how well it works.

Advanced PDF document understanding incl. page layout, reading order & table structures

and thanks for the Docling tip. Looks interesting over all.

anonny · December 16, 2024, 10:35pm

Yes. It ingests only. It doesn’t modify. It takes the text and puts them into a vector. The only question I have, is does the db place of the files in DT change? If they don’t change after creation then that would be great since Elephas wouldn’t have to keep reindexing. I’m not sure if the files in DT change location, can you help clear this up?

I’ve also been telling the developer to talk to you guys. These two products are amazing together and it will solve a lot of the AI problems people keep mentioning.

Please don’t change DEVONthink. It’s amazing as is!

anonny · December 16, 2024, 10:37pm

Docling is great! It creates nicely formatted markdown files from PDFs, and other file types. I’ve only done PDFs but it keeps the structure intact. I don’t care about images so I’m not sure how that would impact you, but definitely give it a try.