Best A.I. to use with DEVONthink?

jbmanos · September 4, 2025, 2:26am

I depends on what kind of thing you need done. For textual analysis, I’m fortunate to have a local machine that can run some very large models locally, and I’ve learned enough MLX and coreML to have some really performant local models on my home network (qwen3-235B is perhaps my favorite to run local…. The latest release is very good with the languages I use, and it picks up a lot of the nuances that the frontier models catch). I also run a very large qwen3 coder model for …. You guessed it, code related tasks.

I really don’t get everyone’s fascination with OpenAI … even chat 4.1 is a snooze fest. I find Claude even worse…. Claude has a very nasty probability of getting started on a losing subnetwork and walking itself into dead ends that it will spend 1M tokens before admitting “You’re absolutely right” when informing it that it just wasted half a gigawatt hour computing…. At other times, Claude can be very good, so I can see why people like the good Claude at times…. It’s just too bipolar for me and too runny…

I don’t see anyone talking about Grok, and I’m sure there’s all sorts of opinions based on the shenanigans they play with the free public model, but supergrok 4 is insanely good and stable, especially at iterative conversation and long context. It’s also perhaps my favorite for technical papers and professional topics. It will quickly adjust tone if I give it hints as to whether I want cheeky, sarcastic, or light banter and it catches nuance better than the other models, I think. The frustration with grok can be the on and off tool problems it has: one day the PDF artifacts are broken, the next day you can have a PDF but markdown is inline and not in artifacts, etc. But if I need frontier size work on something, it can’t be beat — especially on hairy, complex, nasty research into heavy stacks.

I’ve tried Gemini, and while it will hang, I find it to have so flat a response and I notice that it silently ignores a lot of nuance. For legal work, especially when pulling perspectives, that’s not helpful to me.

I have copilot pro (MS ChatGPT) for work and I thought it was a dog last year — it was obvious that Microsoft was cutting corners on compute, as it was also the laziest model of them all. In recent months, it’s really changed, though, and it is kind of obvious that they’ve been using some kind of router behind it to have some tasks handled by some hidden models that are pretty good.

Sometimes, I ask the LLM to play rivals with a document or a set of research — give it some roles to play against a topic (e.g., in very simple terms: how would a lawyer, a chemist, and an accountant react to this?). Again, supergrok seems to handle this kind of task the best of the frontier models, although I would say that chat/copilot can hang there also sometimes)…. Qwen3-235B and Kimi K1 also do really well at tasks like that also — any of them seem to handle iterative prompting and turn conversations really well.

For embedding, document processing, and other tasks, I tend to use some special purpose locally hosted models…. I didn’t get into image analysis and such above, but there’s a bunch of other models for all that. And having the chat window in DT so easily able to connect up to LM studio running on the home network makes it a breeze to have such flexibility so conveniently!!! I think the best answer is to try them for different tasks and you’ll settle on seeing which one does different tasks best for you.

jbmanos · September 4, 2025, 2:38am

Sarcasm is on:

“Bro”, My experience was GPT 5 was a emoji-filled childlike bullet ridden list encounter, ridonkulous mess!

it gave some good answers

formatted like it was a TikTok script

and got stuff going!

jbmanos · September 4, 2025, 2:46am

I think it’s known around here that I’m a long practicing IP and media attorney (who is also a chemical engineer that was way into big compute back in the day)…. This is one of the reasons I’m so bent on local models, and keeping it all air gapped from the large tech models.

It is a mess, and I’ve been rather fond of trying ways to introduce adversarial noise and silent poisoning to products as an intentional method to defeat AI ingestion or conflate what AI gets when it tries to read. The concept was proven early on in images where people discovered that they could defeat image analysis by adding human-imperceptible noise to images. Since then, AI has added some steps to image processing that defeat this step, but that began the arms race…

I think you’d find this interesting:

I’ve been trying various methods for text as well…. Even prompt injections or other

Janny · September 6, 2025, 9:54pm

The best move is to ask any giant cloud AI perhaps grok perhaps gemini whatever this question. Define the purpose and structure of your database and what you need out of it and it will direct you step by step on how to make it exactly happen in ways that are inconceivable. You can also host your own model so as to not rely or pay a cloud service.

BLUEFROG · September 7, 2025, 12:28am

Asking specific questions about DEVONthink is not guaranteed to produce useful (or even factual) recommendations. I have certainly fielded “I asked ChatGPT how to do _______ in DEVONthink, but it’s not working!!?!” support tickets. giant, cloud AI does not mean infallible and accurate.

In fact, using DEVONthink 4’s Help Assistant is bound to produce more truthful replies to questions. However, database organization is not some “one size fits all” topic. You would be far better suited in using your own “I” i.e., your brain, to answer this.

winter · September 27, 2025, 2:26pm

On the high probability that the range of choice will start to shrink, a report of a Deutsche Bank note:

hr1 · October 7, 2025, 10:01am

May I ask what kind of hardware you have (how much RAM, how many CPU/GPU cores) — I assume we’re talking about a Mac?

hr1 · October 7, 2025, 10:08am

This (long!) article explains and describes in great detail that the operation and use of current GPT models incur far more costs than can ever be earned with them. Worth reading.

Blue_General · October 13, 2025, 5:05pm

Building on the discussion about Grok – I echo the points on its strengths and uncensored reasoning. As a long-term DT 3 Pro user and prospective DT 4 Pro upgrader, I would like to not rely on the indirect OpenRouter workaround for Grok, as it’s a bit clunky: extra setup, potential latency, and it doesn’t feel as seamless as the native hooks for GPT, Gemini, Mistral, Claude, or Perplexity.

Native support would unlock a lot – e.g. querying DT database with Grok’s advanced reasoning for summaries, auto-tagging, or even custom agents that pull in live context. xAI’s API is OpenAI-compatible, so integration should be rather straightforward (base URL: api.x.ai/v1, chat completions endpoint, etc.). A basic AppleScript workaround using curl for direct API calls might be of help here, but official support would make it effortless for everyone.

Any idea of bumping this up the (priority) to-do list?

And … thanks for the great DT Pro!

BLUEFROG · October 13, 2025, 5:47pm

Welcome @Blue_General and thanks for the kind words!

While the request is noted, there are currently no plans to add other specific AI providers. It’s not as simple a task as just adding an endpoint. OpenRouter is the recommended option.

jbmanos · October 14, 2025, 2:48am

did you try using the “OpenAI (compatible)” option available in the DT Pro settings? I haven’t because I don’t use Grok on API (although I’ve been tempted to try 4-fast and 4-code and they are dirt cheap)… I tend to just drag stuff to Grok in safari (I have the flat rate subscription to it so I don’t think I can send API using that). Plus, I have my DT Pro set to LM Studio and a chat completion endpoint on the local network, so if I were to go about trying it, it’d upset the balance for me! hahahahaha

panxq · October 14, 2025, 6:18am

It is hoped that DT will support more OpenRouter models, including DEVONthink Help, Transcription, and Image Generation. it is also hoped that OpenAlI (Compatible) will be available for these modules as well.

vinschger · October 14, 2025, 7:24am

Dear @Blue_General have your tried using different models with DTP4 and Openrouter? Works really smoothly. I do not agree to your claims. At least for me using openrouter is NOT clunky at all, it’s easy to setup (just enter the API key once in order to use all models). I haven’t noticed any latency. Why do you find it NOT as seamless as activating the other LLMs?

Blue_General · October 14, 2025, 7:30pm

Hello @vinschger and Community,

Appreciating the DEVONthink 4 Pro’s AI capabilities, I’d like, however, to advocate for native xAI support. Currently, accessing Grok via OpenRouter works, but it introduces some challenges:

Privacy and Security: Adding an intermediary like OpenRouter creates an extra layer in the data chain, which, from my InfoSec perspective, raises privacy and security concerns for sensitive research workflows.
Cost Overhead: OpenRouter’s ~5.5% fee on top of xAI’s pricing adds unnecessary costs, especially for frequent users, compared to direct API integrations like those for GPT, Claude, or Gemini.

Native xAI support would streamline access to Grok’s reasoning and real-time X data, especially for DEVONthink’s research-heavy users. As mentioned earlier, given xAI’s OpenAI-compatible API, integration should be rather straightforward.

Thanks for considering!

vinschger · October 14, 2025, 8:28pm

thank you. Now I understamd you much better!

cgrunenberg · October 15, 2025, 6:01am

The OpenAI (Compatible) option should already work without additional costs or delays. Unfortunately the model list returned by xAI doesn’t contain any information about model capabilities (contrary to e.g. Mistral AI, OpenRouter, LM Studio or Ollama). However, automatic detection of capabilities is planned for future releases.

brookter · October 19, 2025, 7:57am

As a matter of interest, given Musk has boasted of changing Grok results because they don’t fit in with his agenda, why would anyone willingly use it? You’re not just getting the standard AI propensity to make things up, you’re getting deliberate deceit as well.

Leaving aside any ethical concerns, doesn’t that invalidate any results from the start?

Blue_General · October 19, 2025, 8:29pm

The claim that Musk “boasts” about tweaking Grok’s results to fit an agenda seems largely overstated. From what is known, the goal is to make Grok as truth-focused as possible, countering what they see as bias in other AIs – e.g., ChatGPT’s cautious redacted(?) responses. Sure, Musk’s feedback has shaped updates as every AI gets tuned by its creators, but I don’t see evidence of “deliberate deceit.” It’s more about aiming for neutrality, which I find good.

On AI making things up, yes, hallucination is a risk with all models. Grok’s not perfect, I use it because it offers a different angle, less filtered than many competitors. Invalid from the start? I don’t think so — every AI has its lens; Grok’s just leans toward questioning norms.

Ethically, I get the concern, but using Grok doesn’t mean I back everything Musk says or does. It’s about the tool’s value — same reason I use DEVONthink despite not agreeing with every dev choice. If you’ve got a specific example of Grok being off, I’d love to see it.

BLUEFROG · October 19, 2025, 8:39pm

and @Blue_General:
Let’s be cautious on the direction here. While everyone is entitled to their opinion, we have a broad range of people here covering the spectrum of religious/political/social/etc. beliefs. We keep things as neutral as possible on such topics, not out of fear or trying to censor, but respecting these differences. Such topics can become very hot, very quickly, Please consider that. Thank you.

brookter · October 19, 2025, 9:01pm

Bearing in mind Jim’s polite request, I won’t go into details here, but if you do a quick internet search you will easily find enough examples to suggest that Musk is manipulating Grok furthering un-evidenced and unscientific right-wing conspiracy theories, which are the opposite of ‘neutral’. He has posted about doing so, and there is some evidence that Grok searches his posts before replying on some topics. This goes beyond what the other companies seem to be doing, and to put it mildly, that does seem to bring risks to the credibility of research based on it.