Generative AI is completely optional and doesn‘t need a separate index. All data required e.g. by the chat assistant is provided on demand using DEVONthink‘s own index & AI. That‘s why it‘s also possible to easily restrict the database search scope (see above).
Sorry Im just being slow here. So in terms of usage, if I have a couple of hundred PDFs in a database and I want to ask the AI to provide me with, say, a stat that’s found in one of them, it’ll just undertake the request on the fly? By reviewing DT’s index that has previously been generated without the use of off-machine AI?
Yes, that’s what it will try to do. But for just for searching I would definitely still recommend DEVONthink’s search - no privacy concerns, immediate & precise results, no additional costs and requiring a lot less energy.
You can highlight all of the documents and enter your query. But for a couple hundred that may not be very performant.
A better plan would probably be to use DT4 to combine all of those PDFs into one as long as the size of the combined document is smaller than the context window of the LLM you are choosing.
It may be possible to give a better response/suggestion if you can share in general terms the nature of the documents and what your query would be.
So my use case is that I have a couple of hundred PDFs in a folder. These are public domain documents about public health that I’ve assembled over time - annual HIV statistics, Government strategy documents, clinical study results, a few of my own meeting notes, that sort of thing.
At the moment I have them in a database in Devonthink. I’ve been using Elephas to sync with this database. In Elephas, when you add the database, it runs them through whatever AI model you use and stores the results as vectors. Then you can ask it questions about the contents and it gives you a near instant results.
So I could say “How many new cases of TB were there in the UK in 2021” and it’ll give me an answer along with a link to the document it found it in. It can also combine references to give me an answer, (“How many more cases were there in 2022 compared to 2021?”) and give me references to each.
Needless to say this is pretty damn handy and always seemed like a natural fit for DT, since DT is where all this stuff lives for me anyway.
That sort of query is easily done if the total size of the PDF size is smaller than the context window of the LLM you use.
Google Gemini Flash has the largest context window by far (2 million tokens) but it has a tendency to hallucinate. If your concatenated PDF library fits within that context window you can try it easily.
Assuming that does not work then your question is semantically similar to the issue I face when trying to use AI to query thousands or sometimes tens of thousands of pages of medical documents and determine “What are all the diagnostic test results” or “What are all the diagnoses” or “What are all the post-op reports.”
In my case the questions I ask are consistent but you could easily write a script that lets you change the question each time.
So in your case you could write a script that goes through each document individually asking your question about new cases of TB in UK in 2021. Then you can have the script concatenate all those responses and respond to a master query to create a chart or bullet points etc with all of the responses.
On a large set of documents this might take 15 minutes or so to execute and might cost $3 or so using Claude 3.7 (which is the best LLM for this sort of inquiry). So it works if it’s an important inquiry that you do occasionally. But it’s not going to work if you do this sort of inquiry dozens of times a day as you would a google search.
But if for example you are doing an exhaustive literature search as you begin a specific new project or there is a reason why this a really important question to answer definitively, then the scripting approach would work well.
Again I use this to create a chronology or index of medical records containing tens of thousands of pages of documents - it produces for me an Executive summary, bullet point summary, chronological summary, and various other reports - all with copious hyperlinks to the source document/page. The result is stunning and often results in a final index or chronology easier /faster to use than even a dedicated clinical EHR system used in clinics/hospitals.
And for any onlookers… what you are doing is outside the norm and nothing built directly into DEVONthink. You have a very custom setup with supplemental scripts you’ve written to accomplish your specific goal.
Yeah, I’m using it as a personal library to be searched ad-hoc and there’s no rhyme or reason as to what I’m looking for in a given moment. Could be stats, could be an ELI5 on something before I walk into a meeting, might just be telling what a given acronym stands for. It all just depends on what my work happens to be that moment, so there isn’t really any prep I can do. I just treat it like a personal Perplexity, basically.
Its not the end of the world if this isn’t in DT right now, since I can already do this elsewhere. But I guess I’d like to put this down as an idea for future consideration. Deep analysis of your databases as a whole via AI with no prep work and instant results I think would be a great development for DT, my original privacy concerns notwithstanding.
Agreed - and took a good bit of time to work on those custom scripts. Works well for a very focused task but not right out of the box.
Read the Getting Started > AI Explained section in the Help.
I’ve not installed 4 so I can’t read that Im afraid (unless its online as well). I wanted to understand that privacy question earlier before I installed it.
Very exciting that version 4 will soon be released. I’m very pleased with the licensing model and I’ll definitely be upgrading. I sometimes worry that software I depend on doesn’t pull in enough revenue and might disappear or never get significant improvements. I count on DEVONthink a lot.
As usual, your documentation is top-notch and already available. That’s helping a ton in understanding what’s in version 4.
I’m a bit worried about trying out the beta. I suppose there’s a chance things could get clobbered and my use of version 3 impacted. For example, I wonder what would happen if I were to turn on versioning for an existing database (especially one which indexes cloud files). And I worry about preference files being shared between DT3 and DT4 and being changed in a way that affects my version 3.
What are the things I should avoid doing? Should I only play with new, test databases?
Thanks for the kind words, especially this: “As usual, your documentation is top-notch and already available. That’s helping a ton in understanding what’s in version 4.”
Regarding safety, read this…
For example, I wonder what would happen if I were to turn on versioning for an existing database (especially one which indexes cloud files)
Versioning isn’t an external process. Versions are handled inside the database. Versioning is covered in the Documents > General > Versioning and Inspectors > Versions sections of the Help.
@BLUEFROG how do we know when the public version is ready to purchase and download?
Will you send an email?
I prefer to wait for the stable version, but don’t to want to miss it.
BTW: I second who agrees with the new business model, that I find a honest one, a good compromise.
And I’ve also heard your Ad on MPU podcast: I wish it gets results because you deserve it.
Another announcement will be made, likely a follow-up newsletter, and also via the Check For Updates function when DEVONthink 4 launches.
BTW: I second who agrees with the new business model, that I find a honest one, a good compromise.
And I’ve also heard your Ad on MPU podcast: I wish it gets results because you deserve it.
Thanks so much for the support and encouragement. We really appreciate it!
@BLUEFROG Is it possible to add DeepSeek?
Welcome @quentin
There are currently no such plans, but the request is noted.
And there are usually at least 4 people in a household that benefit from the expense. And they also have different pricing based on where in the world the household is located.
Just want to chime in and congratulate the DEVONthink team, with a rock-solid Beta v4!
I haven’t been as active on the forums of late, given work commitments — but happened to stumble across the news over at the MPU forums: Colour me surprised.
It was an instant purchase, and I even took the step of plunging straight into installing the public beta (which I am normally far more cautious over). But, as was expected, my confidence in the DT team was well-placed, been very happily chugging along, with only a minor few rough edges here and there.
Again — congratulations all!