Okay, sure. However, without the details for the tool call the results are obviously random. And your current prompt is already really verbose, with examples and everything…
For local models, why would I care? Sure they may not run as fast, but I’m not paying per token.
The 27b version is still a relatively small model compared to commercial models
Understood, but that’s beside the point. If this AI thing in DT doesn’t work with local models, it’s useless to me. I use commercial (online) models all the time, but my document archive won’t go there, not even in small parts. Local is what it is!
So if I understand you correctly, you expect an online model to simply use the instruction “A valid DEVONthink search query” and just happen to know what that implies? And that’s why a local model is perhaps “less good” at doing the same job, because it may not understand the requirements as well? Hm… well then we need a “verbose instructions” mode so that the necessary instructions can be included in the prompt.
Meanwhile, so you’re saying if I run my test query against ChatGPT 4o or something, then I will see a much more detailed query, as I suggested above? I have a hard time believing that, because I still think if the model is never asked to create a verbose query with lots of alternative words and all that jazz, then it will not randomly do this - big and clever model or not.
Exactly what is being discussed re: local model use. Your expectations from local models need to be even more guarded than commerical models. They are nowhere near as capable as a commercial AI engine. And you’re not making simple requests of it either. Just saying something in a simple way doesn’t make it simple thing to accomplish. In fact, the converse is true: the more explicit the prompt, the more easily and “accurately” the task can be handled.
Well, that’s exactly what I said, or meant to say - I’m surprised that the prompt is not more detailed. Not my prompt, of course, but the one generated right now by DT.
Your expectations from local models need to be even more guarded than commerical models. They are nowhere near as capable as a commercial AI engine.
No doubt that’s true, but as far as I’m concerned as a user, for purposes of analyzing my document archive they’re the only relevant option.
Also, I must say that I’ve had very good success using local models for pretty complex code generation jobs, with the main issue being that they take longer. Different task, different results, perhaps, but until we have a very good prompt, nothing exciting will happen.
I’ll watch curiously what the future brings… and I hope you consider the vector database option because I think that will make an important difference.
I do wonder what the cost would be of running a commercial model with the current implementation though, because the volume of data transferred after a database search could be pretty large - as I understand it, BT would simply pass the complete content of numerous result documents to the model for further processing. That could be costly if you pay per token.
If you are worried about security - again use a commercial LLM (but not Deepseek). Anthropic meets all EU and USA privacy policies - they are even willing to sign a HIPAA BAA for medical information.
As for cost - how large is your database? Do you truly need to query the entire database at once? Perhaps you can use the standard DT search features to determine a subset of documents and then run the search on those?
Or maybe you can do a 1-time summary of the critical info - and then run the searches on that summary?
If we understood more about what the documents are and what your are looking for then probably we could give you further direction.
For sure it doesn’t sound like a project doable on a local LLM.
The number of search results is limited. In addition, for each result the contents are summarized/truncated. And it’s possible to control the costs via the Usage option in Settings > AI > Chat
But how do you make sure that a summarized and truncated result set includes everything relevant to the processing the model is going to do? You can limit cost this way, but you limit quality of results at the same time.
You can’t. Context windows are not unlimited. Even a vector database uses only data that might be relevant to the query and will fit into the context window.
Even if it should fit into the context window (e.g. LLama 4 is supposed to support up to 10 million tokens), then these models frequently ignore information in the middle. That’s the joy of AI - responses might be incomplete, hallucinated or pure nonsense and therefore should be checked by the user.
The optimum solution requires knowledge of what your database contains and what you will want to retrieve.
For example, if i am summarizing medical records then it likely is sufficient to create summaries containing page links to (a) FIrst page of Each diagnostic study - summarized as date and nature of study ; (b) The first page of each physician office visit, summarized with doctor, name, date, and 1 sentence summary of content; ( C ) First page of Each operative note, summarized by date and type of surgery; and (d ) First page of each hospital discharge summary, summarized by date and discharge diagnoses
If you are using AI to search your database, the goal likely is not for AI to know every single fact in the database; rather it helps for you to think through the types of queries you will do and how AI can index the information so you can access what you need.
Indeed - what you likely want from AI is NOT a summary but rather a detailed INDEX of your data. Think through (a) What would your ideal index look like; and (b ) What prompt(s) will help AI to create that index?
This is true. Of course a vector database is made to give you very much better chances at finding the contextually relevant parts of your data before you confront the context window problem.
And after that it’s algorithms, more so when you try to get by without the features of a vector database. I proposed one algorithm above: in conjunction with the existing index based search, it could be a separate step for the model to help in the construction of the best possible index query. Multiple steps could be made, from the top of my head:
Using the user’s prompt, as I showed above, ask the model to suggest terms and expressions relevant to a database search.
In a similar step, identify “counter-expressions”. For instance, in the case of my search for “house purchase” and given the name of the house, it’s an obvious thought that documents may contain the name of the house as part of an address. Perhaps this is unavoidable, but if a search term “payment” will be used, then depending on context we could use an exclusion term “bank statement” in an attempt to narrow things down.
Use the model again to construct the database query, constructing it correctly according to syntax rules, taking advantage of correct nesting levels to combine inclusion and exclusion terms previously generated.
This all still leaves the problem that the resulting data may be large, which is a problem for the context window, but also limits performance, drives up cost etc… nevertheless, many important steps have been made already at this point - even if DT would now show the list of result documents to me and say “unfortunately the result set is very large, so please confirm you want to run the target prompt with all this data, or work on narrowing it down”, then that would serve an actual purpose.
Anyway, more than enough said I think… this has obviously moved way beyond my original question. I understand where we stand with DT, I hope things will progress, and if my input is ever of interest please feel free to ask. I believe we’ve reached the point where some readers will feel fed up with my willingness to keep communicating. I’m used to that, but I’ll stop after this:
It’s simple: I want to talk to the AI as I would to any computer who has the ability to read all my documents, understand and investigate their content. I want to ask it things and get the correct answers. That’s the entire point of this generation of LLM-based AI - what you’re proposing about specialized indexes and summaries and all that was only necessary as long as the overall goal I’m describing was unachievable. Yes, we used to have separate databases and folders and groups and indexes for separate things - now we just tell the computer to pick things apart for us. And yes, when I use present tense I realize that in some regard we’re not quite there yet… but we’re close enough that this is my goal. To be clear, of course this may not be DT’s goal - I certainly can’t speak for the makers of this fantastic software, although I hope they work towards the same goal in the medium to long term.
Thanks for reading, everybody. Have a nice evening.
Vector databases make sense when public data is made available for public query- particularly when the database is of petabyte size or more.
If your data fits in a Devonthink database and it is your personal database, then it is quite possible that you might get more useful responses using DT4 if you structure the process optimally.
I am very pleased that DT4 has chosen not to use vector databases; I have tried almost every “Chat with PDF” app that exists and none can give me as useful access to my data as DT4 can.
That said- every case is different. It might well be that your situation is better suited for a vector database. Or there might be a data structure / prompt design that we have not discussed.
All I can be certain of is that it is not true categorically that “vector databases are always better” or that “context window searches are always better.” Each has its place.
Assuming for discussion purposes that you do not know in advance what types of queries you will want to do - might it be that DT4’s lightning-fast non-AI boolean word search would be best? If you use a vector database then some of the keywords on which you query might no longer be searchable.
DEVONthink’s AI integration is optimized for working with documents/selection, no matter whether via chat assistant, batch processing, smart rules or scripts. This ensures both a focused context & privacy, improves the responses usually and also reduces costs.
For searching databases we still recommend DEVONthink’s search as it’s fast, free, precise and flexible.