I am using Deepseek-r1 locally through Ollama and have configured DT4 to use it, which works well - thank you, this is amazing!
however when using ollama with Anything LLM as a frontend I get to see the reasoning of the modell as well as the creation of reasoning and end result in real time (or rather trickling in, considering it is local). derived from this there are two improvements that would be helpful:
- as there is additional value as well as information in the model’s reasoning process, it would be great to get that output in the DT4 chat as well (and not “just” the end result)
- as local LLMs are inherently slower than cloud based ones it would be great to see the reasoning / the reply come in as the model sends it out, ie token by token. this would provide for a much more interactive “feeling”.
thank you for considering…