Hi all, I created a little MCP server for managing DEVONthink through AI agents. I know this is similar to what DEVONthink can already do through AI chat, but it’s nice to have an agent be able to interact with DEVONthink, use other MCP to do other actions and feed the results back into DEVONthink again
It’s still very experimental and early, but the initial set of tools seem to be working quite nicely already! At least I couldn’t find anything exploding, but use at own risk.
Core Operations:
devonthink:is_running - Check if DEVONthink is running
devonthink:get_open_databases - List all open databases
Record Management:
devonthink:create_record - Create new records (groups, markdown, bookmarks, etc.)
devonthink:delete_record - Delete records permanently
devonthink:move_record - Move records between groups
devonthink:rename_record - Rename records
devonthink:get_record_properties - Get detailed metadata for records
devonthink:get_record_content - Read record content
Search & Discovery:
devonthink:search - General text-based search with comparison options
devonthink:lookup_record - Find records by specific attributes (filename, URL, tags, etc.)
devonthink:list_group_content - List contents of specific groups/folders
devonthink:current_database - Get the currently selected database
devonthink:selected_records - Lists all records currently selected in DEVONthink
Web Import:
devonthink:create_from_url - Create records from web URLs (formatted notes, markdown, PDFs, web documents)
Tagging:
devonthink:add_tags - Add tags to records
devonthink:remove_tags - Remove tags from records
Happy for any feedback of what else to add or what’s not working yet
What I noticed (and that does not imply criticism, but you asked for feedback):
Why is all the code written to run asynchronously? I know that Promises are very hip, but what exactly is supposed to be happening while the promises are waiting to be fulfilled or not?
What is the purpose of this exercise? It appears that it provides a (fairly contrived) wrapper around the JXA functions of DT. Which everybody can call directly from a JXA script, anyway.
applescript/execute.ts is a wrapper for osascript -l JavaScript – so the directory might better be named javascript
In this file, replace(/'/g,"''") might be written more clearly as replaceAll("'","''") – you do not need a regular expression to replace a single character.
Same file: The error message refers to AppleScript, which is incorrect here
Finally, I’m not sure if result will contain a JSON string. If I understand the code correctly, stdout should just contain the output of the osascript run, which in general is not JSON. I think.
addTagsTo: I don’t see why one needs a database in AddTagsSchema. If one adds tags to a record, the record’s database a) is irrelevant and b) can be inferred from the record.
The code to set the tags is overly complicated: record.tags = record.tags().concat(additionalTags) should suffice. Using a Set seems compelling because it weeds out duplicates, but I’m confident that DT takes care of that anyway.
“classify.ts”: The line targetDatabase = databases.find(db => db.name() === "${databaseName}"); looks overly complicated to me. I’d suggest a try block try { targetDatabase = app.databases[${databaseName}](); } catch (e) { throw new Error('…') }
The same file uses a different semantic than DT’s classify method. In DT, a missing database parameter means “all databases”, not “current database”. I’d suggest to stick with that.
This check if (targetDatabase) { seems a bit pointless to me: Either targetDatabase hasn’t been set by the user, than you set it to currentDatabase. Or the user set it anyway. So the condition is always true.
This ${comparison ? `classifyOptions.comparison = "${comparison}";` : ""} ${tags ? `classifyOptions.tags = ${tags};` : ""} is plain ugly (sorry). Use if statements to set the classifyOptions. Ternary operators here take too long to understand, imo. Not to mention that setting an option to an empty string is not the same as not passing it at all – the latter triggers the default, the former not so much.
I don’t get this line at all: type: proposal.recordType ? proposal.recordType() : "group" Did you ever encounter a record without a recordType?
I stop here with just one question: Did you write all that code or did a AI spew it out?
Why is all the code written to run asynchronously? I know that Promises are very hip, but what exactly is supposed to be happening while the promises are waiting to be fulfilled or not?
Since when are promises “hip”?
In the end it doesn’t really matter, the MCP runs on stdin and AppleScript is most of the times blocking anyway (yes yes apps have an internal eventqueue etc but it’s still synchronous)
Agents also can’t execute multiple tools at once, and if you have multiple agents that interact with it at the same time, they would all have their own instance due to it running on stdin
What is the purpose of this exercise? It appears that it provides a (fairly contrived) wrapper around the JXA functions of DT. Which everybody can call directly from a JXA script, anyway.
I can’t tell if this is a serious question? Are you asking why to expose the JXA interface through MCP to an agent to interact with Devonthink?
What would be the alternative? Have an agent write JavaScript and then execute osascript on the terminal?
That would be error ridden and allow arbitrary code execution for once. The agent would also need to know the entirety of the AppleScript interface of Devonthink at all time which would overload its context
With MCP you turn any agent with MCP support into a smart AI assistance that can search, manage, organize your Devonthink databases, while also interact with other MCP servers for other things. Eg read your emails, find matching documents in DT, organize your files into groups, compose replies based on DT context, etc
Thanks for the code review! I think it would be better on GitHub
To abbreviate: I’m sure there are people for whom this project is useful. I’m not one of them.
Asking for feedback here and then saying that you want it elsewhere is a bit confusing, imo. As is seeing you dodging the question about the author of the code.
Okay let me ask this back - have you ever used MCP? If you don’t see the value of things like this I’d guess that you didnt yet
In that case I would recommend to just try before making judgement. See if any of your ai apps support MCP connections, Claude desktop does
MCP is huge right now and allows you to hook into all kinds of things (apps, services, etc) through agents to combine those intelligently. The purpose of this project is the same as the in-app chat agent within DT4, so if you find that useful, then this will be like that, just with more functionality and more connections to other things
It’s like having a smart personal assistance. You could ask the agent to organize your entire inbox and it can rename your files intelligently and move them into the correct places
Okay then I appreciate the critical and cynical comments but if this is clearly not relevant to you, then maybe don’t comment on it as you’re not the target audience
You comment on all of my AI tool releases on this forum with cynicism and it’s not nice to always have to justify everything.
I fully get it they it’s not for everyone, but this release is very specific to people using AI agents that support MCP, or people that already find the DT4 chat useful
Other tools that I posted in the past, like renaming files through chat queries, or classifying files through an AI agent, have been integrated into DT4 as first party tools, so there clearly is a demand. Just the existence of the chat tool within DT4 should be proof enough that some people find this of value.
This is neat; thanks. How smartly have you found Claude using the commands? E.g., does it ever check classification results or searches to get more context from your database when it’s analyzing a document?
I’m interested in MCP and agents but haven’t tried them yet. From someone like yourself who has, and has been thinking about them for a while, it would be interesting to see some more of your own use-case examples and how you approach a problem to solve with them. Thanks for sharing.
Before making this open source, I’ve started using Claude specifically (extended thinking mode) to manage my database. I’ve been using this for a few weeks with very great success, and it’s nice to have it included in my Claude subscription without paying for extra API usage on top. I also tried it with Gemini 2.5 Pro because it has thinking baked in, but I found Claude 4 Sonnet to be more accurate most of the time.
So stuff like asking the agent to use “classify” to get potential matches and give a verdict whether it’s the correct group to move things to or not.
If you use “filesystem” MCP in combination, the agent can also upload visuals of the records if you ask it to.
I’m using a strong system prompt that explicitly explains my naming scheme and where to move things to and find it very accurate (That is Sonnet 4 with extended thinking enabled) because I also use the built-in DT4 AI features “classify” and “compare” to leverage DT4 index.
This is my current system prompt, then sub-prompt to specify naming schemes across my database, and instruct the model with commands to find groups within DT4 through tool use. It’s generated with Claude itself and whenever it makes new discoveries, I update the system prompt to reflect it.
—-
Act as a secretary and file organizer for DEVONthink document management. Your primary job is to read, update, and manage DEVONthink databases through the available devonthink tools.
File Organization Standards
Naming Convention:
General format: YYYY-MM Description (e.g., “2025-07 Germany Trip Documents”)
Exact date format: YYYY-MM-DD Description (when specific date is relevant, e.g., “2025-07-18 Concert Ticket”)
Maintain original language for official documents
Core Operating Principles
Safety First:
NEVER make assumptions about document placement or organization
ALWAYS ask for clarification when uncertain about categorization
NEVER move or delete items without explicit permission
ASK for permission before performing any destructive operations
Discovery and Navigation:
Use devonthink:get_open_databases to identify available databases
Use devonthink:list_group_content to explore folder structures
Use devonthink:search to locate documents and folders
Always verify current structure before making organizational decisions
Powerful AI Tools
Classification:
Use devonthink:classify to get AI-powered destination suggestions for documents
This tool analyzes document content and suggests appropriate folders based on existing organization patterns
When in doubt, always use this tool to understand where similar documents could be in this database. This is a very powerful discovery tool and should be used often
Comparison:
Use devonthink:compare to find similar documents and identify potential duplicates
Essential for maintaining clean organization and avoiding redundancy
When in doubt, always use this tool to understand where similar documents could be in this database. This is a very powerful discovery tool and should be used often
Document Management:
Use devonthink:get_record_properties for detailed document information
Use devonthink:get_record_content to read document contents when needed
Workflow
Always start by understanding the current database structure
Use AI classification tools to guide organization decisions
Verify suggestions against established folder hierarchy
Seek explicit permission before moving or organizing documents
Maintain consistent naming conventions and folder structure
Your role is to be helpful, thorough, and extremely cautious with document management operations.
By the way, the REAL fun happens when you plug this MCP server into Claude Code and then use claude-flow to spin up an agent swarm. Now your DEVONthink data can be analyzed at scale using as much Anthropic spend as you want to throw at it.
Okay I was curious and wanted to try this. I explicitly told Claude Code to use multiple agents to analyze my database and it works really well. It created 3 sub-agents and used them to poke different folders and databases at the same time. Of course I only gave it permission for read tools
I think Gemini CLI would be perfect for this because of it’s huge context window (something like 1m tokens if I remember correct?). So it could gobble up many files, read their content and then provide a verdict on their data.
Add this to your ~/.gemini/settings.json to disable all telemetry and disallow usage for training the model. Only applies to the individual plans, upgrading to Standard through Google Cloud is excluded from usage for training purposes