Really impressed by AI implementation in DT4 beta

darwin · April 8, 2025, 8:28am

You could use it directly in the Browser within DT. Just save it as a Bookmark. You could easily drag and drop pdfs from DT into NotebookLM.

rkaplan · April 8, 2025, 8:55am

Is this any different than the result you get using Gemini Pro in DT4? Or are you saying it’s effectively free for you to use given the subscription you already have?

As for detailed chapter summaries, are you sure it is really summarizing your PDFs and not using its pretrained knowledge? The only way to really know is for your prompt to request frequent page linkls back to the source. I was initially impressed with Gemini Pro until I did this and realized it fooloed me because it had a surprising number of fake references; it was not really summarizing my PDF but instead was understanding the general content/context of the PDF and filled in details from its past training.

That could be an ominous scenario in many situations - where Gemini could fail to recognize small but critical differences between your work vs the work of others in your field.

If these summaries are just for fun or just for basic personal background information then the hallucination risk is no big deal. But if these are documents where details matter you cannot ever rely on any LLM without links back to the source of every single fact.

I treat LLM summaries as an index - not as content. If you do otherwise in any situation where facts/details matter it will eventually fool you and can do so in a very embarrassing way if it is for professional / commercial use of any kind.

jonmoore · April 8, 2025, 9:41am

Thanks for the suggestion. I always use a third party web browser rather than the one built into DT as I rely on the ecosystem of extensions in my web browser of choice. But your suggestion makes sense in the context of NotebookLM+.

I very much view DT as a knowledge management hub, and use third party apps for viewing and editing DT entries, e.g. I use Typora for creation and editing Markdown docs, Skim for PDF’s (inc PDF annotations) and HookMark as a connection protocol between different aspects of my knowledge management system.

In all my years of using DT, I’ve always gone the indexing route, in preference to physically adding files to my DT database. I’ve found, if this is done in a thoughtful manner, it offers great flexibility. My core rationale for working with DT in this manner is that I use multiple technologies for making sense of my accumulated knowledge assets, and having those knowledge assets stored directly in DT creates issues when using non DEVONtechnologies tools to interrogate those knowledge assets.

It’s going to be interesting to see how the integration of LLM technologies within DT will change my historic DT usage habits. At the moment, I view LLM tech in DT as a “library management assistant”. And much as with real-world assistants, I’ll start with the basics and gradually ramp up the assistants “job description”, as I discover its strengths and weaknesses.

jonmoore · April 8, 2025, 4:47pm

rkaplan:

Is this any different than the result you get using Gemini Pro in DT4? Or are you saying it’s effectively free for you to use given the subscription you already have?

As for detailed chapter summaries, are you sure it is really summarizing your PDFs and not using its pretrained knowledge? The only way to really know is for your prompt to request frequent page linkls back to the source. I was initially impressed with Gemini Pro until I did this and realized it fooloed me because it had a surprising number of fake references; it was not really summarizing my PDF but instead was understanding the general content/context of the PDF and filled in details from its past training.

That could be an ominous scenario in many situations - where Gemini could fail to recognize small but critical differences between your work vs the work of others in your field.

If these summaries are just for fun or just for basic personal background information then the hallucination risk is no big deal. But if these are documents where details matter you cannot ever rely on any LLM without links back to the source of every single fact.

I treat LLM summaries as an index - not as content. If you do otherwise in any situation where facts/details matter it will eventually fool you and can do so in a very embarrassing way if it is for professional / commercial use of any kind.

I test my LLM tools with content that I know well, both in terms of content and tone specifically, so I’m able to more easily spot hallucinations.

With this in mind, I’ve found that using Gemini Pro 2.5 in tandem with NotebookLM+ is significantly better than any other LLM tool I’ve used to date at summarising content. NotebookLM is a separate service to Gemini Pro, and isn’t the same as using Gemini Pro with third party applications such as DEVONthink. NotebookLM+ provides more generous data caps for subscribers, but the non subscription version is pretty generous too, within reason. And Google allows non subscribers to use the latest Gemini Pro 2.5 model with NotebookLM, so it’s easy enough to test, for anyone who wants to explore its strengths and weaknesses. I’m not certain on this, but I do believe you do need to be logged into Google in some manner to use the service, but you don’t need to have a paid Gemini subscription to use the service.

https://notebooklm.google.com/

Much as NotebookLM will ingest and summarise the equivalent of all three Lord of the Rings books as a single PDF upload, my own workflow is to feed it bite sized chunks (individual chapters of content). And follow that with an analysis of the combined chapter summaries as a single PDF document. My PDF reference library is extensively annotated using the Skim PDF reader, which is a personal preference, as Hookmark is able to create deep links to specific content within PDF’s. The reason I mention this, is because those annotations are a second line of defence in terms of my ability to judge the quality of the NotebookLM+ summaries. It’s all too easy to have cognitive biases affect one’s memory of the things they’ve read, so the annotations serve as aide-mémoires to the actual book/document content.

rkaplan · April 8, 2025, 7:00pm

Thanks @jonmoore

Can NotebookLM give you links back to the page where it got each fact from or only links back to the document? I tested it when it was based on an earlier LLM and it was not able to give me page links to source.

andreas_schmidt · April 8, 2025, 11:00pm

Regarding data integrity, you never quite know what happens under the surface and to the files you don’t touch in your tests if you don’t use professional software test methods. That being said, I haven’t encountered any data integrity issues. Just as in the DT2 and especially the DT3 betas. @cgrunenberg et al. know their stuff. (Not sure about the first cloud sync module, though.)

I haven’t encountered a single crash or run into a single bug in the DT4 betas. Or at least one that hadn’t already been reported in the beta forums. But I haven’t used DT’s new features like AI-assisted batch processing or file renaming.

cgrunenberg · April 9, 2025, 5:24am

Thanks for the nice feedback

megob · April 9, 2025, 4:15pm

I don’t want to open a new thread, I hope adding this question here is ok. Though it could be that it’s a quite stupid question. (I’ve read “Running DT 4 Beta” and the AI Help already.)

I don’t understand the field “API Key” in the AI settings. That is, I assume that I can get this key from the AI supplier I chose to integrate with DT 4. But, on the other hand, I understood from the help files that it makes sense to play around with different models and that basic functionalities are free. I occasionally use free ChatGTP and Perplexity. But my free accounts do not offer an API key, it seems. So do I have to buy an AI subscription in order to try out AI in DT 4?

The local models could be an alternative, but they require an “enabled web server” – another field where I don’t understand what it means in this special case.

So, to put it simple: What is the easiest way to test out AI in DT 4? Buying any basic subscription and (hopefully) getting an API key?

rkaplan · April 9, 2025, 4:17pm

The best way to test out AI is to get API keys from Google, Anthropic, and OpenAI.

Yes it costs money. But unlike a subscription to ChatGPT or similar, use of the API is on a pay-as-you-go basis.

This could be as simple as putting $5 on each of the 3 accounts above - that would be enough to test each one. You would be under no obligation to pay anything else again unless you were to use up your $5 and choose to continue with that LLM provider at a later date.

megob · April 9, 2025, 4:19pm

Excellent. Thanks a lot @rkaplan . That answers my question – I didn’t know that the paying options differ so much among the providers. I’ll give it a try in the way you described!

Charles56 · April 9, 2025, 4:45pm

That’s useful info, which I’ve been learning by doing (which I don’t mind, but I’m glad to see that I stumbled onto a solution that others are using).

I bought Anthropic and OpenAI credits, but I believe I was able to test Gemini 2 Flash without purchasing anything. I’m enjoying comparing results among the three, so I may even buy some more credits when I exhaust the ones I have.

BLUEFROG · April 9, 2025, 6:43pm

Here’s something you may not know…
If you’re testing different engines or models in the Chat assistant, you can prompt one and get the result. Then switch the model in the dropdown menu and enter Same prompt. The current model will refer to the previous conversation and respond to the previous command. This is super useful when iterating over the various models or engines.

Charles56 · April 9, 2025, 7:56pm

Oooo. That sounds cool. I’ll try this right after my first cup of coffee in the morning.

meta · April 9, 2025, 8:02pm

It doesn’t know anything. It generates statistically likely text in response to text it has been fed, like the predictive text feature on your phone keyboard, but with orders of magnitude more data.

If you give it some text which it has seen a lot online (or in illegal copies of books), it will spit out text that is likely to be a response to the text you gave it. That’s the magic part.

If you give it some text which it has rarely or never seen online, it will go ahead and do its best to come up with some text that’s statistically likely to make sense as a response to the prompt you gave it. The result will probably be plausible in terms of syntax and structure, and hence look like an answer, but be completely wrong in terms of facts because it has no understanding of facts.

My favorite example of this is the endless amusing variations of feeding LLMs the “wolf, goat and cabbage” problem and watching them fail, because they have seen a ton of training data of the regular problem. They’ll just flip into regular “wolf, goat, cabbage” answers even if you tell it that the wolf is a vegetarian, that the goat is dead, that you have a wolf, a goat and a basketball, and so on — it doesn’t know what a wolf, a goat or a cabbage are. You can even tell it that the farmer doesn’t need to cross the river, and it’ll still go ahead and tell you how to get everything across the river.

rkaplan · April 9, 2025, 10:58pm

@meta

All good advice

That said - the more recent LLM models have improved a good bit from the days of headlines that ChatGPT recommended eating glue or rocks. One of my favorites was asking ChatGPT how long it would take to ride a bicycle from San Francisco to Honolulu and it did the math.

That said - there are a number of prompting techniques that can reduce this sort of confabulation or illogic:

“If you do not know an answer say so”

“Go step by step and think through this”

“Give an explanation of how you found your information at each step”

jonmoore · April 10, 2025, 1:14pm

Based on the capabilities of the latest chat assistant models, my usage strategy is loosely as follows (a strategy I devised with the help of chat assistants).

1. Advanced Prompting Techniques: How you ask the question significantly impacts the model’s ability to reason. Key techniques include:
- Chain-of-Thought (CoT) Prompting: Encouraging the model to “think step-by-step” or explicitly output its reasoning process before giving the final answer. This breaks down complex problems into manageable parts, improving accuracy on arithmetic, common sense, and symbolic reasoning tasks.
- Tree-of-Thoughts (ToT): An extension of CoT where the model explores multiple different reasoning paths (branches of a tree) simultaneously. It evaluates the intermediate steps and potential outcomes, allowing it to backtrack or choose more promising routes, making it better for problems requiring exploration or planning.
- Self-Consistency: Running the same prompt multiple times (perhaps with slight variations or using CoT) and selecting the most frequent answer among the outputs. This improves robustness by mitigating the chance of a single faulty reasoning chain.
- Least-to-Most Prompting: Breaking down a complex problem into a sequence of simpler sub-problems and solving them sequentially.
2. Be awave of Retrieval-Augmented Generation (RAG): While not strictly a reasoning technique itself, RAG significantly aids reasoning by providing models with up-to-date, relevant, and factual external information before they generate an answer. This grounds the model’s reasoning process in specific data, reducing hallucination and allowing it to reason over information not present in its original training data. Being aware of RAG is particularly useful when using Gemini Pro 2.5, as it self-checks with current data retrievals against it’s training datasets.
3. Be aware that the best models are fine-tuned with Reasoning Tasks: Models are increasingly being specifically fine-tuned on large datasets designed to test and improve reasoning. These datasets might include:
- Mathematical word problems (e.g., GSM8K)
- Logical deduction puzzles
- Scientific question-answering (e.g., ARC)
- Code generation and debugging tasks
- Multi-step instruction following
4. Be aware of Agentic AI and Tool Use: A major trend is developing AI “agents” that can reason about how to solve a problem by decomposing it and deciding when and how to use external tools. This involves reasoning to:
- Understand the user’s goal.
- Plan a sequence of actions.
- Select appropriate tools (e.g., a calculator for math, a search engine for current events, a code interpreter for execution, APIs for specific services).
- Interpret the tool’s output and integrate it into the ongoing reasoning process. Models like Gemini 2.5 Pro and GPT-4 show increasing capabilities in tool use and function calling.
5. Be aware of Multi-modal Reasoning: The latest models (like Gemini 2.5 and GPT-4V) can reason across different modalities, primarily text and images, but potentially audio and video too. This involves understanding relationships, interpreting data presented visually (charts, diagrams), and integrating information from multiple sources.
6. Be aware of Neuro-Symbolic AI (a trending research Area): While less mainstream in deployed systems compared to LLM techniques, research continues into hybrid approaches combining the pattern-matching strengths of neural networks with the rigorous logical inference capabilities of symbolic Ai (like knowledge graphs or logic solvers). The goal is to achieve more robust, verifiable, and interpretable reasoning. The successes of LLM’s made symbolic Ai research far less popular, but its use is trending once again as a reinforcement technique to LLMs.

In summary:

Chat assistants have moved beyond their stochastic parrot capabilities. The latest Ai “reasoning models” are generally utilised by the most powerful foundation models (LLMs like Gemini 2.5, GPT-4, Claude 3). These work well with sophisticated prompting techniques (CoT, ToT), are grounded by external knowledge (RAG), which are potentially fine-tuned on specific reasoning datasets, and these are increasingly capable of planning and using tools (Agentic AI). The focus is less on a unique “reasoning architecture” and more on eliciting and structuring the reasoning process within these capable, general-purpose models. Progress is rapid, with continuous improvements in handling complexity, reducing errors, and expanding the scope of the problems Ai can tackle through reasoning.

Much as I’ve been impressed with the rapid progress of Google’s Gemini Pro capabilities through it’s 1.5 to 2.5 models, I’ve always used multiple models first via Perplexity, but I’ve now moved to Kagi as it’s $25 a month premium package combines the advantages of private advertising free search, with a similar multimodel chat assistant to Perplexity in that it uses the latest Claude, Sonnet, Haiku, Mistral, GPT, Gemini Pro and Llama models, and as an added bonus includes the subscription Wolfram Alpha service. Since moving to Kagi, I’ve found it to be a more useful multimodel service than Perplexity for desk research.

And much as I haven’t historically leaned on the web browser in DEVONthink for desk research, the feature set in DT4 has changed my perception of its worth when using various chat assistants. I also own DevonAgent and can’t wait to see what DevonTechnologies have cooking with that particular product. I hate the term “game changing” but DA boosted with the features of chat assistants should make for a potent brew.

BTW @BLUEFROG, loving the ability to use the Same Prompt keywords to query secondary chat models with the previous query. Nicely thought through interaction design.

BLUEFROG · April 10, 2025, 1:52pm

It was something I stumbled on when testing for the documentation. In fact, I will officially add it to the documentation

BLUEFROG · April 10, 2025, 11:28pm

And a short paragraph about Same prompt is now official for the beta 2 documentation.

bjornivarsson · April 11, 2025, 5:59pm

Ooh, beta 2.. Maybe I should ask Joe Kissell to leak the release date? By the way, now that you have released the beta you should leak when his new version of the Take Control book is released just to mess with him

BLUEFROG · April 11, 2025, 6:22pm

LOL
@joekissell is too nice to pull such a prank on him