I am curious about feedback and preferences among those who have tried both GPT-4.1 mini and Claude 4.
Claude Sonnet 3.7 and more recently Claude Sonnet 4 have been my go-to LLM model for some time for review of academic or professsional documents. I use it to get an overview of very large documents or document sets and to effectively create an index to the documents via hyperlinks. I do not use them to author any content and I do not rely on any of the information before I comfirm it in the source. With those caveats, AI has been a gamechanger.
Occasionally I have used Gemini Pro 2.5 for larger documents that exceed Claude’s 200K context window; but I find Gemini to be more prone to hallucination than Claude so it is not my firrst choice.
I tried out GPT-4.1 mini recently and I have been impresed -
1 - Almost as good at Claude in analyzing factual documents
2 - No appreciable hallucination I have seen so far - certainly no more than Claude
3 - 1 MIllion token context window for GPT-4.1 mini vs. 200K for Claude
4 - GPT-4.1 mini costs about 10% of the cost of Claude 4 Sonnet
So GPT-4.1 mini is basically 90% as capable for 10% of the cost and with a 5-times larger context window compared with Claude 4 Sonnet.
For simple and frequent tasks like summarizing I use the cheapest and therefore fastest models like GPT 4.1 Nano or Gemini 2.5 Flash Lite. Claude 4 Sonnet and Claude 4 Opus are still my preferred choice for more complex tasks, e.g. a deep web research, coding etc. But for me the context window isn’t as important as it is in your case
Overall I cannot even remember the last hallucination as I almost never use the trained knowledge on its own and as DEVONthink filters e.g. irrelevant results of web/Wikipedia searches (as irrelevant context can easily confuse LLMs).
Hallucinations have certainly gotten more rare as LLMs and DT have improved
I find the remaining situation that most often leads to hallucination is being too specific in a prompt.
If for example I ask an LLM to “Find 5 articles rebutting this paper” then it might only find 3 good ones and may make up 2 more. Hallucinations seem much less likely if I don’t specify its response in terms of quantity of words or ideas.
Still going back and forth from ChatGPT to Claude Sonnet, but seem to be using the latter more.
I’m working on a large novel, lots of chapters, and one advantage is that Claude offers Project Knowledge, a place to upload files you can ask Claude to look at when responding to a query. It helps overcome the problem of the knowledge of one chat vaporizing when closed – I can go back and look at a chat, but Claude doesn’t remember it. It doesn’t always work very well. I can ask Claude to search through all those files for, say, a character name, and “he” finds less than half of them. (Anybody got a gender solution here? I know the name Claude can be male or female. Maybe this is a “they” situation? And it’s awkward to talk about having a chat with Chat.
I more often get cut off in Claude than in Chat, and have to wait 4 or 5 hours to get back on. Sometimes that’s welcome; I should take a break.
I like the way Claude writes, have seen fewer hallucinations, and when they occur, the apologies and offers to rein in his imagination are really amusing.
I use DuckDuckGo’s Assist for casual queries. Tend to like Chat for more factual questions, but don’t always trust the answers.
I wish s/he had taken firmer hold, with the slash a stand-in for “and”, “or”, or “everybody in between” – and having the s come first is a nice corrective. At least no one would be confused about whether it’s singular or plural … which I am, for all that I understand why it’s being used. Too bad “one” is so posh.