PKM + AI Wiki for navigating and organizing DT4

See this article describing how you can combine Claude Code and Obsidian to create a self-organizing Wiki directory of content. (Below is an AI generated summary of the article)

I much prefer DT4 as my PKM database but I I find the approach described here very compelling. I would imagine you could recreate this functionality using DT4 AI integration instead of using Claude Code separately.

@syntagm - You’re doing great work integrating DT4 with LLM/MCP integration and wanted to get your thoughts.

Has anyone else in the community used AI in this way?


Article Summary:

Core Concept

Instead of repeatedly querying raw documents, the approach uses an LLM to build and maintain a structured wiki that accumulates knowledge over time. The key insight is that LLMs eliminate the maintenance burden that causes human-maintained wikis to fail.

Three-Layer Architecture

  1. Raw Sources - Immutable articles, papers, transcripts, notes (read-only for the LLM)
  2. The Wiki - Structured markdown pages the LLM writes and maintains (summaries, entity pages, concept pages, cross-references)
  3. The Schema - Instructions defining how the LLM ingests, organizes, links, and queries information

Three Operations

  1. Ingest - Drop in a source; LLM reads it, writes summaries, updates relevant pages, refreshes index, logs changes
  2. Query - Ask questions against the wiki (not raw documents); answers get filed back for future reference
  3. Lint - Periodic health checks to find contradictions, flag stale claims, surface orphan pages, suggest gaps

Implementation Guide

  • Use Obsidian as the vault with folder structure: Clippings/, raw/, sources/, pages/, queries/, plus CLAUDE.md(schema), index.md (catalog), and log.md (chronological record)
  • Define the schema in CLAUDE.mdto guide the LLM’s behavior
  • Access via Claude Code terminal

Key Advantages

  • Maintenance cost approaches zero (LLMs don’t get bored updating cross-references)
  • Knowledge compounds as you ingest and query
  • Suitable for personal research, business wikis, reading books, competitive analysis, or any ongoing knowledge accumulation

Recommended Tools

  • Obsidian Web Clipper for frictionless article ingestion
  • Graph view for visualization
  • Marp for slide generation and Dataview for dynamic queries
  • Git repository for version control
1 Like

Are you aware that posting AI-generated content is explicitly discouraged here?
If you expect me to read what you post, write it yourself. If you don’t, do you expect that an AI reads it?

4 Likes

That’s why I included links to the source article. Here it is again.

There is no reason why you cannot use the same concept with DT4 instead of Obsidian.

5 Likes

The one drawback to this approach for building a “second brain” that immediately comes to mind is that it would be very easy to just dump everything you come by into it.

While it would probably be cool to watch as it builds up in real time (so to speak), it’s also very likely that the ease of dumping means you would not actually process many things going into it. That would render this “brain” far less useful than deliberate input.

5 Likes

You expect others to spend their time and energy to read something that you didn’t want to spend time and energy on to write in the first place. For me, that expresses a lack of respect.

5 Likes

I completely agree about the need for friction when putting content into DEVONthink. However, I do like the idea of a wiki inteface to help me navigate. and review content. I’ve tried the using DT4 wiki features but it doesn’t provide a navigational structure. I’ve also tried the table of contents and and conversion of of annotations to markdown files. These work well for small projects and a limited set of files, but not to navigate all of my database.

Regardless, I found it a very interesting use case. Even better DevonThinks AI integration would support this use case.

1 Like

I was actually trying to save people time from having to click to the article just to get the context of my post. my personal preference is to stay within the DEVONthink forum, instead of forcing people to read external content until the topic is of interest to them.across applications when possible.

just curious, how would you have gone about sharing a link to an article? How much context would you provide in your post before link.

If I took away the AI summary would that have been better?

In my opinion: yes.

First, it was not clear to me that (which might just be a cognitive limitation on my part) that you posted a summary of something that was hidden behind the link. Second, the link you posted goes to a page consisting of a ton of things, much of it being other links.

Not to mention that the “AI summary” is hardly more than exactly the original text you seem to be referring to.

And (if I get the gist, which perhaps I don’t) the point seems to be: too much stuff to read for humans, let the AI do that and build a wiki. Which who suddenly has the time to read?

I’m probably just too old for that. For me, there’s little point in managing information that I don’t have the time to read. Where would I find the time to read the summaries?

4 Likes

Ironic isn’t it?

This thread is interesting, but you touch on one of the key issues of “AI” use for (at least) two reasons. The AI today doesn’t truly “think” so it’s important to keep humans using it as a tool rather than outsourcing the thinking. We need to be in/on the loop. And as you imply, it’s the thinking we employ during processing, organizing, connecting dots, maintaining, etc that really helps our own knowledge and understanding grow.

That said, I think there is a tradeoff here as well. I’ve done a lot of bottoms-up build and organization of my knowledge base. I wouldn’t mind having a better wiki than what I’ve created and it might help me make decisions faster and better (by exposing what I’m overlooking) for new items or finding/comparing to existing items. But I do believe we need to be cautious in where we draw the line and balance. Is it really a 2nd brain and I’ve understood this stuff or is it an offline version of wikipedia?

5 Likes

Exactly. AI is a great tool, but there is always a balance to be had.

Hi there, I am working on a similar thing based on the setup of Andrey Karpathy. In my personal setup I am combining this second brain concept with my DT database. DT is the leading system with all original documents and the LLM-Wiki is holding all md-files for a much faster search. I use also the MCP to connect Claude with DT which is working extremely well but a bit slow and it consumes also a bigger amount of tokens.

@PG66

Kaparthy’s system is described in the link I shared earlier too, so I’m very much interested in what you’re doing.

Please share your progress and let me know if I can help in any way.

What might an AI-generated “wiki” do for you that DT’s search engine can’t? As I understand it, the intentionality of links in a wiki is part of the point, and that can only come from a human.

5 Likes

Thanks for posting this, Lee. I would find it useful. I rely on summary notes I have created over many years of legal propositions, key statutory provisions and the like. I understand @kewms’ point about intention, however the problem (for me, at least) is that I have so many notes created over the years which I do not have time to update when I capture a new source document. For example, I import a new legal judgment which might have 5 points that are relevant to existing topic notes organised under different subject matters. To update those notes manually, I have to navigate to subject folder 1, find the note, update the note with the case citation, link to the case and summary of the proposition derived from the case, do the same with the next note, and so on. I often import multiple cases, as well as articles, when doing my research catch-up. It just isn’t practical to manually capture and update all of my notes with the points these sources raise, so it just doesn’t happen.

I tried adapting the prompt from Andreas’ article to DT4. It worked on a test of a replicated set of topic notes and sources, however it currently requires too much manual input to update the system notes and it can’t update the content of existing notes (Claude says it would need an MCP). I suspect it is an idea whose time may come in the future .

Without going into detail: The problem you describe is the main reason for “specialization” in science. More and more people know more and more about less and less.

So, if something changes in a particular subfield of a science, specialists have that stored in their “first brain.” Not all the details, but the fact that these changes might be important. They then take the details from their “second brain.” :joy:

That’s probably also one of the reasons why “first brains” talk to each other when they don’t understand or don’t know something. So my trick is to know as many specialized “first brains” as possible, instead of trying to build a “second brain.” :slightly_smiling_face:

3 Likes

The WIKI is less for consumption by the human’s first brain but for your LLM to “remember” the collected knowledge in your DT4 database. From Kapathy :

Most people’s experience with LLMs and documents looks like RAG: you upload a collection of files, the LLM retrieves relevant chunks at query time, and generates an answer. This works, but the LLM is rediscovering knowledge from scratch on every question. There’s no accumulation. Ask a subtle question that requires synthesizing five documents, and the LLM has to find and piece together the relevant fragments every time. Nothing is built up. NotebookLM, ChatGPT file uploads, and most RAG systems work this way.

The RAG approach is my current workflow (and limitation) when working with DT4’s LLM integration:

  • Point the LLM at a selection of files and ask questions.
  • This is useful for point-in-time (chat session) questions but there is no memory across sessions.
  • NOTE: this doesn’t mean you offload the desirable friction of processing new content (bookmarks, PDFs) with annotations and tags. You can use this system with just your own data (notes and annotations you’ve created)

Kapathy’s system is analogous to an LLM applying Progressive Summarization to your documents.

The examples I’ve referenced earlier use Obsidian and require everything to be converted to Markdown format. I use DT4 because I can capture ANY document type (word, spreadsheet, PDF, web archive). That’s why I’m interested in PG66’s work bringing Kapathy’s system to DT4.

This is a new and very hot topic over the last week and a problem that hadn’t been solved - at least in such a low-tech fashion. Effectively, you are creating a Small Language Model trained on your docs using just text files + an LLM.

Here’s a useful demo using a local LLM (Gemma) and a local web interface (100% private) to create to enable your LLM to have more context (memory) about your PKM topics. Larger Context Windows (tokens) wouldn’t be able to achieve this.

PG66 - please keep us posted on your progress.

Thanks @leehammond for sharing – and also posing the interesting question of how to translate such a system, or its semblance, into the DT-universe!

The interesting part to me is the insertion of “the schema” (…there are other definitions of “schema”, but the term does it´s job here, IMO):

– Conceptually it´s basically baking (“programming”) ones intention as to how the knowledge system should behave, compile and ‘metabolize’ into the automated, LLM-driven system.
That is the really interesting part being stated in this conceptual scheme. And also the interesting part to be inquired further, IMO…

Basically, IMU, it is a high level replication of what everyone “architecturing” his/her own PKM system does anyways: devise things like adequate ontologies, index systems, relational structures (on different levels), relevancies, basic notional structures of the system, preferred affordances etc.
And, it´s like what people do with any tool anyways: as a(ny) tool basically is “intention” intentionally baked into an external system. Just now, with things like agents, LLMs (quasi-semantic transformation systems) things get more liquid, ‘active’ (in a way) and even dynamic compared to a hammer (or a notebook with a personal index system etc.).

– But they also get more complex, of course. This is why the question of handling the scheme is tricky, and interesting at the same time. This is where all the modulation of ones ‘systemic intention’ happens here – vis-a-vis a tool a) build by outside forces of macro-scope, and b) too complex for anyone to really cognitively grasp the insides of it (the “tool” used here; – we know, not even the systems architects really do understand ‘their’ LLM-machines)…

So, would be interested to hear more of your personal insights/experiences around devising, tuning, working with the “schema” – or, for that matter, any interesting reference material that you came across on your travels around the ‘AI Wiki’ – (Even though I remain sceptical of the loose ‘AI’ label in general… as of the metaphoric use of “brain” in “Second Brain”, as every “Second Brain” concept really talks about augmentation – … not about ‘replicating brains’ (at least not in earnest☺️)).

Thanks again for sharing your insights & impulses here!

PS: I do think the excellent search automation scripts graciously shared by @pete31 would help in any such analogous system in DT. In a way, I consider/use them as a proto-automated Wiki-system ever since I discovered them here on the forum…

PPS: you are lucky your post still stands! I once got flagged for a less flagrant, even more contextualized quote produced by LLM :sweat_smile:. While automation is king in the DT-forum world, LLM automation – even if produced/used intentionally – is largely something to burn yourself with here… :fire: :sun:

@lerone – that is a very relevant script you referenced from @pete31. It comes much closer to what I described but still has the same memory/cold start problem I am trying to address.

:warning: Caution: Regex-searching is an expensive operation. This script should not be used with simply selecting all result records. If possible select only those records you’re interested in.

This works for project/task-based research but not for creating a persistent queryable system that retains ‘context’. For example, DT search is very powerful but is still a start from scratch with no memory of prior searches.

You are correct that this resembles how humans build up their schema and the value that process brings to remembering information. I do this with my professional DT4 database because I shouldn’t skip this step if I truly want to know something. However, that doesn’t apply to my other DT databases’ areas of interest – hobbies, health, music – that I capture information and bookmarks for.

Re: the responses to this thread. Honestly, I’m perplexed by all the complaints since I posted this to the Artificial Intelligence section of this community.

Almost all responses seem to be against the use of AI. It is an optional feature of DT, and I posted it to share and learn from others using the LLM features of DT4, really not to debate whether to use those features.

Finally, the only AI summary I posted was to compress the ideas (but not lose meaning) from the source post, which, as @chrillek noted, was behind a Substack post and mixed lots of unrelated content. I also called it out as AI-generated and provided the source link. Given this context, what would have been the “right way” to create this topic?

@lerone – I do appreciate the thoughtful feedback - no negative tone and constructive ideas like the @pete31 scripts.

My goal in posting here was to find a likeminded group of DT4 users who are interested in building or have built a system similar to what I described (Thanks @PG66). The amount of discussion re: the value of AI, this specific goal, or how I posted the topic makes up the lion’s share of the responses, which seems unnecessary. If I’m violating forum rules, I’d rather someone DM me so this thread stays on topic.

Thanks again.

2 Likes

Yes, agree – the script I referenced is not getting to the proactive ‘stance’ one can emulate with LLMs and agents etc. Nevertheless I value it, and thought it’s helpful in this context – also in terms of bridging the “old” (pre-external-“AI”) and the new “LLM” world of DT.

In a way what you bring up has been prefigured in the script and the thinking/mindset that led to it – and its widespread appraisal in the forum (e.g. ways for a kind of iterative filtration/condensation). It seems noteworthy to me that it allowed for actualization/updating w/ one click. In that way it was a hybrid of dynamic automation (what you are pointing to) and ‘intentional (re-)triggering’

– In that respect and this optics the thinking you bring forward here should resonate even with more traditional DT-users – as such scripts, intelligent folders (kind of ‘agents’ avant-la-lettre) and all such things have always been part of DT. Just as AI itself (see the new marketing pivot of the DEVONthink-team labelling their semantic engine also ‘AI’ nowadays)…

I do also still find this “old” script helpful – as it allows to concentrate the search space the LLM then operates on in quite an effective way. So, for different “reasons of economy” (financial as well as in terms of a global ‘economy of resources’) I find it an interesting strategy to think of a system that uses purely local resources for finding and pre-filtering/summarizing the broad content base, and only in a second step fire up the global LLM-machine-system, on top…

Regardless of such aspects, and given that there is still a paradigm shift involved in “programming” intentions or intended processes (systems of ordering, referencing etc.) into ones knowledge system via LLMs, I am still very interested to learn about how you/others approach the “schema” element you mentioned/referenced in your OP?! It seems the problems and challenges of intentionally building this around the triple points of given/found/codified knowledge systems, one´s personal knowledge culture/system, and one´s “real” intention system are intriguing, and an interesting challenge…

As to the kind of replies and the tendency to not discuss content but form and rhetorically ‘box in’ the discussion/thinking space: there are – IMO/IME – different elements to it. Some is opinion built from a common history of entrenched parts of the DT-community and a certain (sometimes narrow) ethos of coding/programming/text-processing etc.

Some of it is of course genuine conviction and scepticism against unfettered ‘AI’ ideology. And there are some good reasons for it – re. the question of ‘personal knowledge systems’ [FN*]; and also in terms of what platform capitalism, ‘digital barons’ and some hyped posthumanists and accelerationists do with/via ‘AI’ in terms of digital/data politics and ethics etc.

So, things need to be discussed, and ways need to be found through the ‘AI-jungle’. Also re. DT – … not least because – as you correctly point out – the new and wide ranging integration of external LLMs into DT is somewhat at odds with a certain stance (very engrained in parts of the forum) dismissing AI more principally and wholeheartedly or in purely globally framed ethos-discussions….

After all, why should using AI in ones personal “intention” space/system (as this is basically what a PKM *is*) be tolerated and even actively enabled (‘propagated’) – but be morally banned outright in ‘inter-personal’ spaces (forums), or when it comes to using new possibilities of LLMs (vs the inbuilt ‘AI’) ?!

Then, as in every forum, there is some cultural policing as to what is deemed ‘ok’ or interesting to say, propose or ask. Every forum has its (unwritten) codices, mainly represented/enacted by the core of ‘power commenters’. This is often tough for fresh voices/thinking who come from a different angle or represent other ‘schools of thought’. And also – as known – this regularly leads to some forms of “group think” and much studied “echo chamber” effects… including (rhetorical forms of) forum policing in every forum that is not actively guarding against such encrustation.

Don’t let yourself get distracted. As you hint, I think it’s a good policy/strategy to concentrate on those voices who are interested in ones original proposition or at least following arguments in good faith and with an open respectful mind – and not get dragged into sideway- and meta-discussions by voices who don’t display basic forms of acknowledgement for different positions, interests and ideas, or even the real topic set by a threads OP.

After a while you will find it easy to spot those contributions who are purely formal – and not engaging in substantial on-topic exchange.

It’s still worth to discuss with all those in the forum who have originary interest in ones propositions and ideas. (Even if they sometimes remain less visible :grinning_face_with_smiling_eyes:)

-–

FN: Then, there is much more complexity to ‘intention’ in relation to knowledge and communication than acknowledged/accounted for here and in similar forum discussions… – just ask any psychologist or sociologist. But a reductive notion of ‘intention’ is part of a lot of thinking and communication especially when it comes to AI (vs. other tool-sets)… or forum discussions.

– see

Alicia Juarrero – 2000, (PDF) Dynamics in Action: Intentional Behavior as a Complex System ); ( Full book )

William E. Connolly – 2011, IThe Complexity of Intention on JSTOR

… just to start

1 Like