DT4 AI cost tracking

Is there a way to track AI usage from the DT4 side? (Requests, Input, Cached input, Output)

In my OpenAI account, I added $20 as a test, expecting that amount to last a couple of weeks because of careful low usage. I asked, I think, 2 questions and it drained the $20 in about 24 hours. I am trying to understand how the money was spent.

My organization is currently in Usage tier 1. The model that I have set is GPT 4.1 (nano). Auto usage. I am not using image generation, nor transcription.

From the OpenAI reports side it shows that there were 853 requests. Total use $19.46. (gpt-4o-2024-08-06, input $3.922 total.) (gpt-4o-2024-08-06, cached input $3.2 total.) (gpt-4o-2024-08-06, output $12.336 total.)

From the DT4 side I would like to keep track of when and where requests are being made. Is there any way that can I best achieve this?

At this burn rate I could easily be spending $200 per month and that is not realistic for my budget.

Settings > AI > Chat > Usage. Tokens per-model are listed there.
Also, ~/Library/Application Support/DEVONthink/Chat.log

What kind of questions and what was selected? For each model the used input/output tokens are shown in Settings > AI > Chat.

DEVONthink 4.0beta2 doesn’t even support GPT-4o anymore, do you use your API key anywhere else? Do you use chat actions/placeholders in smart rules or batch processing? Or features like Data > Tags > Add Chat Suggestions to Documents or Tools > Summarize Documents via Chat?

E.g. I perform many tests almost every day, some of them automated, others not. And the total costs for Anthropic, OpenAI, Mistral, Google, Perplexity & Replicate.com during the last two years didn’t exceed few hundred dollars or less than a ChatGPT subscription each month.

But the used documents are relatively small usually and I use the cheapest models for testing as the more expensive ones should be able to handle things if even the cheap models can do it. Ironically that’s not always the case…

Thanks. That Chat.log is amazing. I will use it to keep track AI data transactions.

You’re welcome :slight_smile:

  • I was using 4.0beta1. I tested during the days 04/09/25 to 04/11/25

  • No other app was connected to the API key

  • API key was used only in DT4

  • Did not use: *chat actions/placeholders in smart rules or batch processing? Or features like Data > Tags > Add Chat Suggestions to Documents

  • Sumarize Documents via Chat? I believe that I did (looking at the log, since it has been over a month since this initial test).

  • Would it help you to see the log?

This might help, just send it to cgrunenberg - at - devon-technologies.com. Thanks!

Did you find out meanwhile with the log what has caused the high costs in your OpenAI bill?

In general the output cost was the largest cost:
Total use $19.46. (gpt-4o-2024-08-06, input $3.922 total.) (gpt-4o-2024-08-06, cached input $3.2 total.) (gpt-4o-2024-08-06, output $12.336 total.)

Specifically, what entailed that output cost I have not been able to really determine. The chat.log is over 800 pages, just for 2 days of use. Lots of repetitive text. I spent some time going through it, also attempting to cross reference with the log from OpenAI but got nowhere.

For sure I will have the chat.log open when I begin testing again. I am thinking that it is a good practice to be very aware of the inputs and outputs when using these AI tools. This is a case where we are throwing money at the service and have little idea of the value of what we get back. Eventually, like cell phone service, it has to migrate to bundled pricing.

Indeed, that is exactly what to expect. Output tokens are the more expensive part of the conversation. For some models, even double the cost.

For most AI models you can include in your prompt something like “Limit your response to no more than XX tokens.”

You could also switch AI > Chat > Usage to Cheapest.

The difference is actually much larger typically, e.g. Anthropic 5 times, Mistral 3-5 times, OpenAI 4 times, Gemini 2.5 Pro 8 times or Gemini 2.5 Flash up to 23 times in case of reasoning.

But when processing documents or search results, the input tokens make up the majority of the costs usually. Only when e.g. translating, transforming or rewriting text or using only the LLMs knowledge the output tokens are the more important ones.

1 Like

I like that tip. Thanks!

Yes. I wanted to see what Automatic did. Now I know. I will switch. Thanks.