DT4 - AI - Ollama for local LLM - Recommendations?

axelbuehler · April 3, 2025, 8:49pm

Hi,

just tested ollama with qwq on Mac Studio M2 with 32 GB - worked, but slow (a couple of minutes for a mediocre summary of four pages). Great!

As privacy is probably one main concern not to use any API / public model - anyone recommendation what llm to use, what settings, how important is context window (I arbitrarily increased it to 10k) - and maybe what minimum tech spec is recommended - 32GB memory seems too small.

Regards
Axel

BLUEFROG · April 3, 2025, 8:53pm

Running a local AI is not going to be feasible or performant for most people. 32GB is not a small amount of RAM… until you want to run local AI.

Again, you should read the AI Explained section in the Help in regards to managing expectations and a realistic view of AI in DEVONthink.

cgrunenberg · April 4, 2025, 7:22am

E.g. the recently released Gemma 3:27b with a context window of 32k (theoretically up to 128k) works quite well on an M1 Ultra with 128 GB and supports tool calls (LM Studio) or vision (Ollama). But of course still no comparison to commercial models.

tja · April 4, 2025, 12:40pm

Is this " AI Explained" somewhere available without needing to download the beta?

cgrunenberg · April 4, 2025, 12:43pm

See https://www.devontechnologies.com/download/extras

toao · April 4, 2025, 5:03pm

am running deepseek-r1 with the 70b parameter model on ollama on an MBP with M1 max with 64GB ram…not exactly fast (1-4 minutes depending on complexity of the request), but workable for most stuff I have played around with so far considering this is truly private…

AW2307 · April 4, 2025, 5:07pm

I would recommend Mistral Small 3 for your case with 32GB RAM. It’s a good model with good token speed compared to similarly sized models.

Also, I’ve found local models to generally perform better via LM Studio than using Ollama, so that would be worth a try. “Flash Attention” in LM Studio, if enabled, does speed things up a bit.