Automated summaries using AI / local LLM

Becherbaer5 · April 8, 2025, 9:02am

Hello everyone,

I’m currently trying to summarize a large number of plain text files using a local LLM based on LM Studio.

I’ve set up a smart rule for this, using the “Chat - Query” action and I want to set the response to a metadata field or something like that. This seems to work technically but I’m struggling with the prompt, Using something like “Please create a summary of the content of this document.” seems to work OK for some documents, but produces JSON snippets for others, sometimes even “null”. However, using “Edit > Summarize via Chat” works reasonably well even for those files that seem problematic using the automated approach.

Any pointers would be welcome, for example knowing what sort of prompt(s) the “Summarize via Chat” function is using might be a starting point to help me experimenting.

Alternatively, a different approach producing similar results would be interesting as well.

Thank you very much.

cgrunenberg · April 8, 2025, 9:11am

Which model do you use in LM Studio and what kind of document? In addition, it’s useful to be more precise and to e.g. specify what kind of summary you would like (few sentences, bullet list, keypoints etc.)

Becherbaer5 · April 8, 2025, 9:17am

I’m experimenting with different options, like MythoMax L2 (good results, small context window) or OpenHermes 2.5 Mistral (ok results, bigger context window). The best summary for my taste would be 1-3 paragraphs of text. The results the built-in summarization feature produces using the same LLM are quite OK most of the time. The results of the LLMs mentioned above are produced by the built-in feature, using “Chat - Query” they are all over the place for some reason.

I won’t give up to easily, because I like the possibilities this integration would allow for.

cgrunenberg · April 8, 2025, 9:25am

A screenshot of your rule and the complete prompt would be useful.

Becherbaer5 · April 8, 2025, 10:05am

The minimal version looks like this:

It’s pretty flimsy, even changing “document” to “text” in this prompt changes quite a lot. But that might just be a sidenote.

cgrunenberg · April 8, 2025, 10:44am

That’s a generic issue of prompting, especially in case of very small local models (due to fewer parameters, heavy quantization and frequently no support for tool calls). You could e.g. try this one:

Summarize the text of this document in few sentences. Be concise. No preamble, no explanations, no reasoning.

Or try a better model, e.g. Mistral Small 3.1 or Gemma 3.

BLUEFROG · April 8, 2025, 12:32pm

Did you read the Getting Started > AI Explained section of the help?

Becherbaer5 · April 8, 2025, 2:30pm

Yes I did. Are you implying that I might need to adjust my expectations when it comes to the quality of the results that can be expected when using local LLMs?

BLUEFROG · April 8, 2025, 2:38pm

Yes, of course. That whole section is an honest and realistic view of AI in DEVONthink so people understand – and yes, manage their expectations – about AI, especially local. You should also have realistic expectations of commercial models, but at this time they are far better than anything running locally for most purposes.

If you wanted to run the full Deepseek R1 671b, this Mac may be able do it but still may struggle a bit…

It would also take almost half your hard drive.

Becherbaer5 · April 8, 2025, 2:57pm

I understand - but in essence, even with the local models the results given by the built-in “Summarize via Chat” feature are reasonably good. I just can’t seem to get anything like that when I try to use the same model using “Chat - Query” in a smart rule, which is why I was curious what the prompt looks like in the internal function.

But anyway, I’ll just experiment a bit more on my own …

PS: The prompt suggested a few posts above gives back a meaningless JSON message, but I’ll keep trying

BLUEFROG · April 8, 2025, 2:59pm

That is definitely something you’ll need to do. There is no “one size fits all” answer and answers can (and often do) vary even when re-asking something you just asked.

cgrunenberg · April 8, 2025, 3:36pm

The prompt wouldn’t work. It’s an internal command and therefore not as flexible as the chat assistant or smart rules but also more reliable, faster and cheaper (in case of commercial models).

Becherbaer5 · April 8, 2025, 4:25pm

I’ll just fiddle around and report back if I end up getting reasonable results. As I mentioned, this would open a world of possibilities.

Becherbaer5 · April 9, 2025, 3:12pm

Now we are getting somewhere. The DT-Part is actually quite trivial:

The model makes all the difference, which doesn’t come completely unexpected. Here’s what I am using now:

In LM Studio, the model (llama-3-8b-lexi-uncensored) uses the following parameters:

Context length: 8129
System prompt: empty
Temperature: 0.8
Top K sampling: 64
Top P sampling: 0.95

It’s the Q8_0 quantization of the model, which runs reasonably well on my MacBook Pro M3 Max. Used like this, the results are surprisingly ok - not quite at ChatGPT level, but not far behind. Since my use case is the pre-selection of 12.000+ text documents this is more than good enough for me, since it’s essentially free.

The only thing to keep in mind is that the prompt really needs to be very simple. Making it more complex or longer has a huge impact.

Maybe this is helpful for someone.

rkaplan · April 9, 2025, 4:15pm

Temperature of 0.8 for summarization?

That might work for a creative summary or creative writing.

But assuming the documents are non-fiction a Temperature of 0 seems most appropriate.

BLUEFROG · April 9, 2025, 4:21pm

Please clarify what you’re referring to here.

Becherbaer5 · April 9, 2025, 4:33pm

I have retrieved a lot of text documents from a number of sources (market analysis, stock market data etc.). The retrieval was based on a technical approach that crawled various sources without looking at the actual content of the text files, PDFs etc.

The content was then converted to plain text and I now want to try and filter out the stuff which is probably irrelevant.

That’s why I’m currently running some tests where I store the summary to see how good / bad it is. In the end, I don’t need the summaries as such, I just want to flag the stuff which seems relevant based on it.

So for now, storing the summary is just a vehicle to actually be able to look at the result - if it’s working ok for some samples I will probably just let the AI flag the stuff that is relevant and worthwhile to explore further.

BLUEFROG · April 9, 2025, 4:36pm

Are the documents stored in DEVONthink or LM Studio?

cgrunenberg · April 9, 2025, 4:36pm

This makes me wonder whether the Chat - Continue if… action wouldn’t be more useful to mark interesting stuff.

Becherbaer5 · April 9, 2025, 4:41pm

You are completely correct - this is what I will do. For now, I need to be able to actually read for myself what the LLM comes up with. When the responses are good enough to be able to just ask a yes / no question I will then use Chat - Continue if … and just label or flag the respective items.