DEVONthink and DeepSeek?

cgrunenberg · January 30, 2025, 9:25am

This is partially just OpenAI’s marketing, e.g. Anthropic’s Claude Opus 3 did this already earlier. In the end it’s possible to add this to most modern models (e.g. GPT-4o or Claude 3.5 Sonnet) using a sophisticated prompt (assuming the model gets enough time).

See e.g. https://www.reddit.com/r/ClaudeAI/comments/1fx51z4/i_made_claude_35_sonnet_to_outperform_openai_o1/?rdt=54452

Exactly. Just a different & usually better approach to “stringing” which requires much more computing time and is therefore much more expensive (see pricing of O1 or Claude Opus). At least before DeepSeek-R1.

ulmulm · January 30, 2025, 12:00pm

At the very least, LLM provides significant assistance for semantic search, especially for non-English languages. Additionally, I’m not sure if it’s appropriate to discuss this here, but I feel that Devonsphere might be more suitable for using LLM.

PocketComputer · January 30, 2025, 3:01pm

The approach we use with our government customers is to import and ‘chunk’ their documents into a cloud based search index, then use a vectoriser model to generate vectors for each chunk.

At query time, we use vector search (in our case Azure AI Search) to generate ‘grounding’ text that is then fed to the LLM (GPT-4o) and this forms the basis of generating the answer.

A well crafted ‘system prompt’ is needed to make sure the LLM sticks to the supplied text for the answer, although that’s still not a guarantee to prevent hallucinations.

All of this is cloud based of course, so you have to trust the platform and LLM vendor (in this case Microsoft), but the performance (speed) is good.

kewms · January 30, 2025, 5:45pm

AI vendors are delighted that you have formed that impression, but it isn’t true. They are using a very large corpus of existing data to predict “what comes next.” They have no ability to “reason” about their output, even to do self-checks like “is this result physically plausible.”

rkaplan · January 30, 2025, 6:25pm

I know that is the classic description

But it has to be more complex than that because some LLMs such as Claude can clearly solve novel problems.

kewms · January 30, 2025, 6:54pm

It is not surprising that a model with billions of parameters can solve more complex problems than a model with only millions.

toao · January 30, 2025, 7:41pm

looking at this more from a usage perspective:

as for many of us here uploading big parts of the data stored in my DT databases to some cloud based service (LLM or other) is a non-starter. at the same time having the capability of an LLM to interact with my data would be invaluable for at least two reasons:

to query my data in regular language as opposed to a search syntax. I can thus interact with it much more naturally.
to ask (my data, which in my case means to a certain extent my past self or former knowledge) more open questions I would not find any answer to through search queries easily: summarizing topics or time periods for example.

now DeepSeek seems to be a substantial step in that direction in allowing to reduce the hardware requirements to run a pretty decent local LLM quite a bit, thus possibly enabling these (and other) use cases without the need of uploading my data.

while my 64GB RAM M1 Max MBP is likely still at the higher end of DT user’s machines, I am hopeful that we have either now reached the sweetspot or are at least pretty close to incorporating above use cases into DT, possibly in a staged approach depending on available hardware…always assuming this is where the DT team wants to take the product to to begin with.:.-)

rkaplan · January 30, 2025, 8:45pm

Anthropic in particular has taken a strong stance on customer privacy. They will even sign a HIPAA agreement in USA to protect medical records privacy.

I think you will be sorely disappointed comparing any local LLM to Claude for document review.

meowky · January 31, 2025, 1:15am

That’s unfortunately not true. If you have the time and spirit, try installing and running any LLM on your own device. Plenty of tutorials on the web. Ask it random questions. You will see whether the performance of the local model is suitable for work which requires some degree of accuracy.

I remember the story of self-driving aircraft. 50(!) years ago Lockheed developed and certified auto landing for commercial aircraft, using computers characteristic of that era. Today, although we having much more powerful computers, the vast majority of landings are still performed by human pilots. Even though landing is one of the most dangerous parts of flying.

cgrunenberg · January 31, 2025, 7:56am

For example 2 minutes on a Mac Studio (M1 Ultra, 128 GB) for DeepSeek-R1 with 70b parameters. The answers are quite good for a local model but due to possible hallucinations, validating the facts would require even more time unless you know them already but in that case why did you ask?

In the same time one could easily perform multiple Google searches and find additional material & more information. Or search in your own databases depending on their contents and in this case get exactly the desired results and without any delays. Or search e.g. Wikipedia or other websites and let AI summarize the frequently endless articles. That’s my preferred approach as this is a lot less likely to hallucinate and saves me a lot of reading (but I’m using inexpensive cloud models like GPT-4o mini, Claude 3.5 Haiku or Gemini Flash due to their speed in these cases).

Each tool has its own usage scenario and shortcomings.

timj · January 31, 2025, 2:43pm

@straylor Thanks for starting the thread, I am also interested in connecting some of my DT3 content to DeepSeek. My use case is a growing PDF library that is becoming increasingly difficult to fuzzy search… So I’m exploring the possibility of using an LLM running on my laptop to help me answer questions like these:

On which page of which book is the story of the psychological study in the mid-1900s with the some kind of animal that sacrificed its children to save itself? (Real example… I have been looking for this one for ages!)
The book Loonshots describes two main types of innovations. Name and define them, then list other models of innovation that are described in the books in my library. Compare and contrast them.
I am looking for a quote in one of the books in my library that pertains to the definition of narcissism being the inability to metabolize shame relationally. What is the quote, and on which page of which book can I find it?

Unless I am missing something (wouldn’t that be wonderful!), I assume that this level of interaction with content is not available (yet?) in DT3. But this is exactly the kind of thing that is possible with Retrieval-Augmented Generation LLMs. I do not want to try and upload my entire PDF library to an online service and pay a monthly fee to search it. Thus, since my entire PDF library is indexed in DT3, my goal at the moment is to run a distillation of DeepSeek locally and connect DS to DT3. What I have so far:

I’m running DeepSeek (14B distillation) locally using Ollama.
I’ve asked it for recommendations in connecting the files in DT3 to DS so that the AI can help as my research assistant.

My current research challenge: DeepSeek reportedly does not have access to the filesystem, so I can’t feed it the path to my library of PDFs. It is, however, suggesting that it can access a web server. ~~Unless I am mistaken, DT3 Pro has the ability to serve a database locally. I wonder…~~ I was mistaken. I misread the documentation and assumed it applied to Pro, but it looks like it only applies to Server… back to the drawing board…

I intend to keep hacking at this, but if anyone has any ideas or wants to explore this, I’d love to know what you think/find…

Fun times!

troejgaard · January 31, 2025, 4:56pm

I can’t help you with LLMs, but I hope I can still offer some useful feedback.

On which page of which book is the story of the psychological study in the mid-1900s with the some kind of animal that sacrificed its children to save itself? (Real example… I have been looking for this one for ages!)

The book Loonshots describes two main types of innovations. Name and define them, then list other models of innovation that are described in the books in my library. Compare and contrast them.

I am looking for a quote in one of the books in my library that pertains to the definition of narcissism being the inability to metabolize shame relationally. What is the quote, and on which page of which book can I find it?

Unless I am missing something (wouldn’t that be wonderful!), I assume that this level of interaction with content is not available (yet?) in DT3.

DEVONthink can’t actively process and present material like #2, but #1 & #3 seem well within bounds of normal search. Maybe the interface of a chatbot conversation is more accessible to you, but it sounds like you could gain a lot from better use of DEVONthink’s advanced search features and a deeper understanding of query construction. Search prefixes, operators and wildcards are pretty powerful when combined.

I think the PDF text content would be enough, but it’s naturally easier if you have additional metadata to query and if you have annotations and/or notes.

I get the appeal of interrogating a pile of documents like you describe. But two things come to my mind:

I think you learn more—and remember it better—by engaging with material youself rather than having it regurgitated to you. (I don’t mean this in a disparaging way, but I can’t think of a better word than “regurgitate”.)
Who knows when this will be possible, if at all… Be that online (without extreme costs) or locally. My impression is that working this way with large amounts of PDFs still seems quite distant. In the meantime you might be better off adjusting your workflow.

For #1 and #3, it would be quick if you had these quotes as separate documents. You can easily embed a direct link to the location in the PDF.^[1] Even better with some keywords/tags for search. For example, #1: psychology, study, subject-animal, child; #3: definition, narcissism, shame, relations.

For #2, if you export a relevant quote every time you come across a model of innovation in your reading, they will be easy to search and work with (tags like: model, innovation seem useful). You could replicate them all to a group, or create an overview note where you wikilink to all the different quotes.

Of course this doesn’t help you retroactively, but I think it’s worth considering if you could improve your workflow here.

Tools > Summarize Annotations includes it automatically, or there’s Edit > Copy with Source Link. And the contextual menu has Copy Page Link and Copy Selection Link ↩︎

toao · February 1, 2025, 8:35pm

thanks to @meowky 's suggestion and your example I did install DeepSeek via Ollama locally now. considering your 2 minute benchmark, I did not install the 70b, but only the 14b model, which so far is surprisingly responsive and has meaningful results to general questions.

does anybody here have experience on how to give DeepSeek persistent access to some of my DT data (e.g. a specific database, ideally however all documents within a group or with a certain tag)? I am assuming that full access might be overwhelming but a subset might be workable, so any pointers would be appreciated.

possibly there is a “balance” by limiting the model as well as the DT data set being made available to the local DeepSeek instance. in any case I am learning plenty of new stuff - in this case what the fans of my MBP sound like…

PocketComputer · February 2, 2025, 3:49pm

I don’t have direct experience with DeepSeek, but with models like GPT-4o and GPT-4o Mini, there is a limit to how much the model can process in one go, the so-called ‘token limit’ which on commercially provided models also has a cost implication.

A potential solution is called Retrieval Augmented Generation which queries the data and returns a subset and presents this to the model along with your question. This means the overall token cost is reduced and the query can be grounded in the text itself, rather than some potential nonsense that the LLM decides to invent

This often involves creating a vector database which contains document fragments, and the vector search retrieves the fragments that are similar to the phrase used in the question. Supabase (which runs on macOS) has support for vectors, but as you can guess, there’s some non-trivial development to be done.

Cloud providers like Azure have visual tooling that can handle a lot of this for you, but I’m not aware of anything running locally that would do this unfortunately.

BLUEFROG · February 2, 2025, 4:32pm

This is a big part of what I mean when I tell people they need to manage their expectations regarding AI.

Archimedes · February 2, 2025, 10:06pm

It’s not too hard to dip a toe into the waters. I downloaded a model (llama3.2) and built something to answer common support questions for our (now defunct) company. It used Ollama and AnythingLLM, with attached documents including two PDF manuals (about 500 pp total) and about 600 web pages that had been written over a decade or so to address particular problems that seemed to cause problems for our customers.

In some ways it seems impressive, but (of course, I think) it doesn’t add to the contents of data it references. It is certainly faster at finding a solution than searching would be in most cases.

There are YouTube videos that can get you into the thick of it in ten or fifteen minutes. I found ollama and AnythingLLM to be useful. Just remember, “Force Quit” can be your friend.

The performance on my M4 Pro MacBook is just fine.

Archimedes · February 2, 2025, 10:11pm

I forgot to add:
You can experiment by adding one or more directories of your DT database to the set of materials that are referenced by your LLM. These directories will be read but not modified. I have not done this, so you are on your own.

BLUEFROG · February 3, 2025, 2:35am

Are you referring to searching in DEVONthink or in general?

Archimedes · February 3, 2025, 6:14pm

In general.

mlevison · February 3, 2025, 6:18pm

I’m late to a busy thread, since models are an attempt distill large chunks of human knowledge into one data source.

I have 64gb M3Max and I can only manage a context window of 8,000-10,000 tokens for many larger models.
Context windows are what is challenging for local models

I want local models for privacy. I don’t think this viable in the short term or possibly ever.

My Obsidian vault with ~750K words, has a 350MB JSON file for ObsidianCopilot which appears to be doing RAG. My main DevonThink DB is 40 times the size of my Obsidian vault to the JSON file would closer to 14GB. This tells me it’s only practical work with 200-300 documents at a time.

I’m curious about two possibilities

Has anyone created an automation/plugin that speaks to NotebookLLM?
Any efforts to explore LLM combined with RAG? As noted above, it would seem to make it possible to work with document subsets.