Gemma3n:e4b is a great local model for renaming and summarization

I revisited some my auto-renaming workflow this morning and especially wanted to evaluate different models to see if I could find a better one.

On my machine (Macbook Air M2 16GB) I found that the gemma3n:e4b model had the best results among the others that I tested against a handful of documents, including:

  • gpt-oss:latest
  • glm-4.7-flash:latest
  • deepseek-r1:1.5b
  • mistral:latest
  • gemma3:latest
  • llama3.1:8b
  • lfm2.5-thinking:latest

(Sorry for the latest tags, I already deleted the models and didn’t record which specific flavor of model for those).

The smaller models (≤ 4B) tend to return filenames that don’t quite fit the request (YYYY-MM-DD Company Title), and the larger ones take a really long time (> 30s) and bring all other tasks to a halt, preventing me from doing any other work in the background.

I tried gemma3n:e4b for the first time today and it turned to be a great balance between speed an accuracy. Working with it in the chat window also gave fairly reasonable results.

The last flourish I added to my workflow was for the Smart Rule to play a sound when the execution completes, which allows me to switch my attention away until I hear the sound.

Indeed, there is no “one size fits all” and finding the model you prefer takes time and testing.

The last flourish I added to my workflow was for the Smart Rule to play a sound when the execution completes, which allows me to switch my attention away until I hear the sound.

I use this method very often, especially as I work on multiple devices. I can hear, e.g., a Sosumi and know a particular rule just finished. :slight_smile:

2 Likes

Hi there, this sounds very interesting to me. How would such a renaming workflow look like? Is it possible to correct automatically typos etc.?

Many thanks!

Peter

Note: Renaming documents does not require using external AI. In fact, it most often doesn’t and AI can be much, much slower.


Correct typos in what, based on what?

1 Like

The way my workflow works:

  1. Hit “Scan to Devonthink” on my Fujitsu Scansnap
  2. Once the document hits the Devonthink Inbox, it triggers my “AI Rename File” rule which queries the LLM to automatically rename the file.

Thank you for sharing this information. Indeed, gemma3n:e4b features a better balance between performance/speed and quality on my MacBook Pro M1 with 64 GB RAM than the previously tested models, such as Mistral-Small:3.2 or qwen3:8b. I also install larger models via native mlx-lm (e.g. qwen3-32b-mlx-6bit), which makes them run faster tham ollama local, but unfortunately they are not available in DEVONthink.

The latest versions of Ollama added MLX support. Another option might be LM Studio.

Can you share a bit more about how you run the MLX models and how much of a performance improvement you have observed with it?

As far as I know, MLX support in Ollama (announced in 2025) is still experimental and hasn’t made it to the stable releases yet. LM Studio is built in Electron, which consumes extra memory. I prefer a native installation using mlx-lm.

Once you install mlx-lm and download a specific model, you can run it in chat mode right from the terminal: e.g., mlx_lm.chat --model mlx-community/c4ai-command-r-08-2024-4bit Alternatively, you can use a Python script to load the model and execute specific tasks.