GPT-4.1 mini vs Claude Sonnet 4

I am curious about feedback and preferences among those who have tried both GPT-4.1 mini and Claude 4.

Claude Sonnet 3.7 and more recently Claude Sonnet 4 have been my go-to LLM model for some time for review of academic or professsional documents. I use it to get an overview of very large documents or document sets and to effectively create an index to the documents via hyperlinks. I do not use them to author any content and I do not rely on any of the information before I comfirm it in the source. With those caveats, AI has been a gamechanger.

Occasionally I have used Gemini Pro 2.5 for larger documents that exceed Claude’s 200K context window; but I find Gemini to be more prone to hallucination than Claude so it is not my firrst choice.

I tried out GPT-4.1 mini recently and I have been impresed -

1 - Almost as good at Claude in analyzing factual documents

2 - No appreciable hallucination I have seen so far - certainly no more than Claude

3 - 1 MIllion token context window for GPT-4.1 mini vs. 200K for Claude

4 - GPT-4.1 mini costs about 10% of the cost of Claude 4 Sonnet

So GPT-4.1 mini is basically 90% as capable for 10% of the cost and with a 5-times larger context window compared with Claude 4 Sonnet.

Any other thoughts or comparisons?

For simple and frequent tasks like summarizing I use the cheapest and therefore fastest models like GPT 4.1 Nano or Gemini 2.5 Flash Lite. Claude 4 Sonnet and Claude 4 Opus are still my preferred choice for more complex tasks, e.g. a deep web research, coding etc. But for me the context window isn’t as important as it is in your case :wink:

Overall I cannot even remember the last hallucination as I almost never use the trained knowledge on its own and as DEVONthink filters e.g. irrelevant results of web/Wikipedia searches (as irrelevant context can easily confuse LLMs).

2 Likes

Hallucinations have certainly gotten more rare as LLMs and DT have improved

I find the remaining situation that most often leads to hallucination is being too specific in a prompt.

If for example I ask an LLM to “Find 5 articles rebutting this paper” then it might only find 3 good ones and may make up 2 more. Hallucinations seem much less likely if I don’t specify its response in terms of quantity of words or ideas.

2 Likes

Still going back and forth from ChatGPT to Claude Sonnet, but seem to be using the latter more.

I’m working on a large novel, lots of chapters, and one advantage is that Claude offers Project Knowledge, a place to upload files you can ask Claude to look at when responding to a query. It helps overcome the problem of the knowledge of one chat vaporizing when closed – I can go back and look at a chat, but Claude doesn’t remember it. It doesn’t always work very well. I can ask Claude to search through all those files for, say, a character name, and “he” finds less than half of them. (Anybody got a gender solution here? I know the name Claude can be male or female. Maybe this is a “they” situation? And it’s awkward to talk about having a chat with Chat.

I more often get cut off in Claude than in Chat, and have to wait 4 or 5 hours to get back on. Sometimes that’s welcome; I should take a break.

I like the way Claude writes, have seen fewer hallucinations, and when they occur, the apologies and offers to rein in his imagination are really amusing.

I use DuckDuckGo’s Assist for casual queries. Tend to like Chat for more factual questions, but don’t always trust the answers.

1 Like

Considering that all the AI’s are an agglomeration of human knowledge and technology, “they” would not be inaccurate.

I wish Douglas Adams was still with us. I think of his joke about needing new verb tenses for when they invent time travel at times like this.

I wish s/he had taken firmer hold, with the slash a stand-in for “and”, “or”, or “everybody in between” – and having the s come first is a nice corrective. At least no one would be confused about whether it’s singular or plural … which I am, for all that I understand why it’s being used. Too bad “one” is so posh.