Availability of gpt 5 nano and mini

AWD · August 8, 2025, 8:09am

Hi all,

The api page of OpenAI now lists GPT 5 nano and mini, as well as others but primarily I was looking for those, because they are presumably better and are also cheaper to use. Is there a way to get them into Devonthink?

BR AWD

cgrunenberg · August 8, 2025, 9:44am

The next release will support GPT-5. But so far our results are a mixed bag, in many cases GPT 4.1 is still superior.

vinschger · August 8, 2025, 5:35pm

have you an idea why it is “below” GPT 4.1? Due to the lower context window?

cgrunenberg · August 8, 2025, 5:51pm

Only OpenAI could answer this question but the context window didn’t matter for our tests. Maybe the model has been optimized for certain usage scenarios or their own ChatGPT.app?

Prompts that basically work for most models (including local ones like Gemma 3 or inexpensive commercial ones like Gemini 2.5 Flash Lite or even OpenAI’s older models like GPT 4.1 Nano) suddenly return definitely worse results. And different settings for the reasoning effort made no difference.

rkaplan · August 10, 2025, 12:42pm

When I first read the GPT-5.1 announcements and early reviews, I was disappointed as well - it seemed to be a step down from GPT-4.1. But then I realized that lost in the details of the initial coverage were two things - (1) OpenAI specifically says GPT-5.1 excels in analysis of health and other professional documents; and (2) While the context window of GPT-5.1 is smaller than GPT-4.1, its maximum output window is nearly as large as its total context window.

I tried it in a web-based project I am working on, and what I most noticed is that for similar prompts, GPT-5.1 is indeed much more detailed in its output. Compare some OpenRouter usage data of mine for example which show both input and output tokens:

So I took a fake set of medical documents that I created for testing/demonstration purposes and compared the results:

Claude 4 Sonnet:

GPT-4.1 Mini:

GPT-5.1 Mini:

At least for this task, GPT-4.1 Mini produces a report which is essentially as useful as Claude for 10% of the cost. There are some nuanced details that GPT-4.1 Mini misses with more complex documents but I think most users would consider the Claude and GPT-4.1 mini reports to be equivalent. Plus GPT-4.1 mini has a context window much larger than Claude which is a big plus.

GPT-5.1 is interesting. It has the smallest context window of the 3, yet its report is by far the most detailed especially with regard to providing links to its sources, which is the most crticial role of AI for my purposes. OpenAI’s approach of allowing basically unrestricted output tokens shows its capabilities here.

While the smaller context window is a limitation, I can get around that by breaking up documents into chapters and then later creating a “summary of summaries” to combine the information from each chapter. It’s more complex to do that than to simply summarize each document, but once the algorithm has been created that process becomes seamless to the user. While DT4 is by far the best example of AI integration I have seen in any application, this use case is one reason while I believe a way to split PDF documents at every n pages via Applescript would be a stellar additional feature in DT4.

cgrunenberg · August 10, 2025, 2:16pm

Actually GPT-5 has a new verbosity API setting, by default it seems to be much more verbose. Depends on the usage scenario whether that‘s an improvement or not.

rkaplan · August 10, 2025, 2:44pm

Agreed

Almost seems like GPT-5 could have been an alternate model of GPT-4. LIke “Gpt-4 Verbose.”

The “verbosity” setting seems to be an easier to understand way to communicate that it has the ability to create far more output tokens than the other OpenAI models. As you say - that can be good or bad depending on the use.