I revisited some my auto-renaming workflow this morning and especially wanted to evaluate different models to see if I could find a better one.
On my machine (Macbook Air M2 16GB) I found that the gemma3n:e4b model had the best results among the others that I tested against a handful of documents, including:
gpt-oss:latest
glm-4.7-flash:latest
deepseek-r1:1.5b
mistral:latest
gemma3:latest
llama3.1:8b
lfm2.5-thinking:latest
(Sorry for the latest tags, I already deleted the models and didn’t record which specific flavor of model for those).
The smaller models (≤ 4B) tend to return filenames that don’t quite fit the request (YYYY-MM-DD Company Title), and the larger ones take a really long time (> 30s) and bring all other tasks to a halt, preventing me from doing any other work in the background.
I tried gemma3n:e4b for the first time today and it turned to be a great balance between speed an accuracy. Working with it in the chat window also gave fairly reasonable results.
The last flourish I added to my workflow was for the Smart Rule to play a sound when the execution completes, which allows me to switch my attention away until I hear the sound.
Indeed, there is no “one size fits all” and finding the model you prefer takes time and testing.
The last flourish I added to my workflow was for the Smart Rule to play a sound when the execution completes, which allows me to switch my attention away until I hear the sound.
I use this method very often, especially as I work on multiple devices. I can hear, e.g., a Sosumi and know a particular rule just finished.
Thank you for sharing this information. Indeed, gemma3n:e4b features a better balance between performance/speed and quality on my MacBook Pro M1 with 64 GB RAM than the previously tested models, such as Mistral-Small:3.2 or qwen3:8b. I also install larger models via native mlx-lm (e.g. qwen3-32b-mlx-6bit), which makes them run faster tham ollama local, but unfortunately they are not available in DEVONthink.
As far as I know, MLX support in Ollama (announced in 2025) is still experimental and hasn’t made it to the stable releases yet. LM Studio is built in Electron, which consumes extra memory. I prefer a native installation using mlx-lm.
Once you install mlx-lm and download a specific model, you can run it in chat mode right from the terminal: e.g., mlx_lm.chat --model mlx-community/c4ai-command-r-08-2024-4bit Alternatively, you can use a Python script to load the model and execute specific tasks.