I depends on what kind of thing you need done. For textual analysis, I’m fortunate to have a local machine that can run some very large models locally, and I’ve learned enough MLX and coreML to have some really performant local models on my home network (qwen3-235B is perhaps my favorite to run local…. The latest release is very good with the languages I use, and it picks up a lot of the nuances that the frontier models catch). I also run a very large qwen3 coder model for …. You guessed it, code related tasks.
I really don’t get everyone’s fascination with OpenAI … even chat 4.1 is a snooze fest. I find Claude even worse…. Claude has a very nasty probability of getting started on a losing subnetwork and walking itself into dead ends that it will spend 1M tokens before admitting “You’re absolutely right” when informing it that it just wasted half a gigawatt hour computing…. At other times, Claude can be very good, so I can see why people like the good Claude at times…. It’s just too bipolar for me and too runny…
I don’t see anyone talking about Grok, and I’m sure there’s all sorts of opinions based on the shenanigans they play with the free public model, but supergrok 4 is insanely good and stable, especially at iterative conversation and long context. It’s also perhaps my favorite for technical papers and professional topics. It will quickly adjust tone if I give it hints as to whether I want cheeky, sarcastic, or light banter and it catches nuance better than the other models, I think. The frustration with grok can be the on and off tool problems it has: one day the PDF artifacts are broken, the next day you can have a PDF but markdown is inline and not in artifacts, etc. But if I need frontier size work on something, it can’t be beat — especially on hairy, complex, nasty research into heavy stacks.
I’ve tried Gemini, and while it will hang, I find it to have so flat a response and I notice that it silently ignores a lot of nuance. For legal work, especially when pulling perspectives, that’s not helpful to me.
I have copilot pro (MS ChatGPT) for work and I thought it was a dog last year — it was obvious that Microsoft was cutting corners on compute, as it was also the laziest model of them all. In recent months, it’s really changed, though, and it is kind of obvious that they’ve been using some kind of router behind it to have some tasks handled by some hidden models that are pretty good.
Sometimes, I ask the LLM to play rivals with a document or a set of research — give it some roles to play against a topic (e.g., in very simple terms: how would a lawyer, a chemist, and an accountant react to this?). Again, supergrok seems to handle this kind of task the best of the frontier models, although I would say that chat/copilot can hang there also sometimes)…. Qwen3-235B and Kimi K1 also do really well at tasks like that also — any of them seem to handle iterative prompting and turn conversations really well.
For embedding, document processing, and other tasks, I tend to use some special purpose locally hosted models…. I didn’t get into image analysis and such above, but there’s a bunch of other models for all that. And having the chat window in DT so easily able to connect up to LM studio running on the home network makes it a breeze to have such flexibility so conveniently!!! I think the best answer is to try them for different tasks and you’ll settle on seeing which one does different tasks best for you.