Therefore the performance of a small, local model like Gemma 3:12b isn’t really surprising in the end.
As discussed before, I was now using gemma-3-27b-it
for the last few tests.
Tool calls (and also e.g. errors) are logged to the file ~/Library/Application Support/DEVONthink/Chat.log
.
I checked this in conjunction with the console logs from LM Studio. I can see that the log file says this:
2025-04-10 12:39:28,981 INFO: Element Labs (gemma-3-27b-it): 1854 input, 120 output tokens used.
2025-04-10 12:39:28,982 INFO: Element Labs (gemma-3-27b-it): Tool call 'perform_database_search' (642877806): {
query = "name:Lochinvar AND (conveyancing OR completion OR \"property transaction\")";
}
2025-04-10 12:39:42,148 INFO: Element Labs (gemma-3-27b-it): 1733 input, 136 output tokens used.
Clearly this is in response to these log items from LM Studio:
2025-04-10 12:39:17 [INFO]
[LM STUDIO SERVER] Running chat completion on conversation with 6 messages.
2025-04-10 12:39:22 [INFO]
[LM STUDIO SERVER] Accumulating tokens ... (stream = false)
2025-04-10 12:39:28 [INFO]
[LM STUDIO SERVER] [gemma-3-27b-it] Generated prediction: {
"id": "chatcmpl-6nmlgx0adxgbkrn628zsd",
"object": "chat.completion",
"created": 1744285157,
"model": "gemma-3-27b-it",
"choices": [
{
"index": 0,
"logprobs": null,
"finish_reason": "tool_calls",
"message": {
"role": "assistant",
"tool_calls": [
{
"id": "642877806",
"type": "function",
"function": {
"name": "perform_database_search",
"arguments": "{\"query\":\"name:Lochinvar AND (conveyancing OR completion OR \\\"property transaction\\\")\"}"
}
}
]
}
}
],
"usage": {
"prompt_tokens": 1854,
"completion_tokens": 120,
"total_tokens": 1974
},
"stats": {},
"system_fingerprint": "gemma-3-27b-it"
}
2025-04-10 12:39:28 [INFO]
[LM STUDIO SERVER] Client disconnected. Stopping generation... (If the model is busy processing the prompt, it will finish first.)
2025-04-10 12:39:28 [INFO]
[LM STUDIO SERVER] Running chat completion on conversation with 8 messages.
2025-04-10 12:39:34 [INFO]
[LM STUDIO SERVER] Accumulating tokens ... (stream = false)
2025-04-10 12:39:42 [INFO]
[LM STUDIO SERVER] [gemma-3-27b-it] Generated prediction: {
"id": "chatcmpl-7kav12aqkmaph6z3x52rpm",
"object": "chat.completion",
"created": 1744285168,
"model": "gemma-3-27b-it",
"choices": [
{
"index": 0,
"logprobs": null,
"finish_reason": "stop",
"message": {
"role": "assistant",
"content": "Okay, I apologize for the previous overly specific queries. It seems we're still not finding what you need. \n\nLet’s try a broader approach. Could you please provide me with any text related to this house purchase? Even a few sentences or keywords from a relevant document would be helpful. I can then use that information to search more effectively, or even attempt to extract the solicitor's name and timeframe using natural language processing if enough context is available.\n\nAlternatively, do you remember *where* in DEVONthink you might have stored documents related to this purchase (e.g., a specific group)? If so, I can focus my search there.\n\n\n\n"
}
}
],
"usage": {
"prompt_tokens": 1733,
"completion_tokens": 136,
"total_tokens": 1869
},
"stats": {},
"system_fingerprint": "gemma-3-27b-it"
}
2025-04-10 12:39:42 [INFO]
[LM STUDIO SERVER] Client disconnected. Stopping generation... (If the model is busy processing the prompt, it will finish first.)
2025-04-10 13:39:42 [INFO]
Unloading model gemma-3-27b-it due to TTL expiration.
Not sure what can be learned from this log, but it seems to me that the tool call succeeds – what still makes no sense, I think, is that the part name:Lochinvar
seems to assume that “Lochinvar” must be part of the file name. I don’t know why this is assumed…
A general comment: since you don’t have a vector database (any reply to my query above about this?) and you rely on your own index queries instead, successful queries will need to be very… I don’t know, detailed? Creative? You know, to simulate the fuzzy and relation-based results you would obtain from a vector database. So I’m surprised by what I see here, so far. The query above has the inexplicable assumption that a search string must be part of the document name - that kills all results in one swoop. (I had another example where I asked about invoices for camping products, and the query I saw assumed that there was a tag “camping”. Hm.)
And while you can see that there are a few possible terms queried at the same time (i.e. conveyancing, completion, “property transaction”), the model only started doing that after I sent it this prompt:
Well… I think you’re doing this wrong. I’m talking about a house purchase, but why would you think that legal documents about this would contain the specific string “house purchase”?
Prior to that, “house purchase” was the only string it was searching for at all! So… what I would expect is that the query would combine lots of relevant synonymous terms to cover as much ground as possible. I ran a test and asked my own model (still gemma-3-27b-it) for interesting search terms like this:
Let’s say you have a large database of indexed documents. A user says to you “What was the name of the solicitor who handled the house purchase for Lochinvar? And in what timeframe did this process take place?” – it is now your job to answer this question, and the first step is to search the index for terms that are relevant to the query. Show me at least 20 such terms, but add more if you think they’re relevant. Cover all aspects of the query posed by the user.
I won’t repeat the results here, you can run it yourself – but they are pretty good even though they need refining, and so I should think that a more useful query would perhaps look like this:
Lochinvar AND (house OR estate OR conveyancing OR completion OR property OR transaction OR purchase OR transfer OR solicitor OR lawyer OR attorney OR "legal counsel" OR "completion date" OR "exchange of contracts" OR "date of completion" OR "offer accepted" OR mortgage OR "stamp duty" OR "property tax" OR "land registry" OR "title deed")
I wonder why such detailed queries are not created? I ran this and it renders pretty good results - I’m not saying it’s the end of the story, it needs refining again to exclude irrelevant items etc… but the contrast to the non-functional queries I actually see running is pretty severe.
Anyway, enough said. I understand this is work in progress and that’s why I’m spending time offering suggestions and feedback. I hope this functionality can be evolved to a point where queries such as mine will simply be answered without problems! And please consider using a vector database! 