Is there a way to track AI usage from the DT4 side? (Requests, Input, Cached input, Output)
In my OpenAI account, I added $20 as a test, expecting that amount to last a couple of weeks because of careful low usage. I asked, I think, 2 questions and it drained the $20 in about 24 hours. I am trying to understand how the money was spent.
My organization is currently in Usage tier 1. The model that I have set is GPT 4.1 (nano). Auto usage. I am not using image generation, nor transcription.
From the OpenAI reports side it shows that there were 853 requests. Total use $19.46. (gpt-4o-2024-08-06, input $3.922 total.) (gpt-4o-2024-08-06, cached input $3.2 total.) (gpt-4o-2024-08-06, output $12.336 total.)
From the DT4 side I would like to keep track of when and where requests are being made. Is there any way that can I best achieve this?
At this burn rate I could easily be spending $200 per month and that is not realistic for my budget.
What kind of questions and what was selected? For each model the used input/output tokens are shown in Settings > AI > Chat.
DEVONthink 4.0beta2 doesn’t even support GPT-4o anymore, do you use your API key anywhere else? Do you use chat actions/placeholders in smart rules or batch processing? Or features like Data > Tags > Add Chat Suggestions to Documents or Tools > Summarize Documents via Chat?
E.g. I perform many tests almost every day, some of them automated, others not. And the total costs for Anthropic, OpenAI, Mistral, Google, Perplexity & Replicate.com during the last two years didn’t exceed few hundred dollars or less than a ChatGPT subscription each month.
But the used documents are relatively small usually and I use the cheapest models for testing as the more expensive ones should be able to handle things if even the cheap models can do it. Ironically that’s not always the case…
In general the output cost was the largest cost:
Total use $19.46. (gpt-4o-2024-08-06, input $3.922 total.) (gpt-4o-2024-08-06, cached input $3.2 total.) (gpt-4o-2024-08-06, output $12.336 total.)
Specifically, what entailed that output cost I have not been able to really determine. The chat.log is over 800 pages, just for 2 days of use. Lots of repetitive text. I spent some time going through it, also attempting to cross reference with the log from OpenAI but got nowhere.
For sure I will have the chat.log open when I begin testing again. I am thinking that it is a good practice to be very aware of the inputs and outputs when using these AI tools. This is a case where we are throwing money at the service and have little idea of the value of what we get back. Eventually, like cell phone service, it has to migrate to bundled pricing.
The difference is actually much larger typically, e.g. Anthropic 5 times, Mistral 3-5 times, OpenAI 4 times, Gemini 2.5 Pro 8 times or Gemini 2.5 Flash up to 23 times in case of reasoning.
But when processing documents or search results, the input tokens make up the majority of the costs usually. Only when e.g. translating, transforming or rewriting text or using only the LLMs knowledge the output tokens are the more important ones.