OpenAI CEO Sam Altman acknowledged in early June 2026 that AI token costs have become "a huge issue" for companies deploying AI products, as reported by Tom's Hardware. The admission comes as enterprise clients report exhausting their entire annual AI budgets in the first quarter of 2026 and OpenAI's single top user consumes 100 billion tokens per month.
What Sam Altman Said
Speaking publicly in early June 2026, Altman described the shift in tone from enterprise customers: companies are telling him "My company spent my entire 2026 budget in Q1. Can you make this more efficient?" Token cost concerns went from never coming up at the start of 2026 to becoming, in Altman's words, a meme among clients. This marks the first time in OpenAI's history that spending efficiency has displaced capability as the primary concern from enterprise buyers.

Altman also disclosed the scale of the problem through a specific data point: OpenAI's top user processes 100 billion tokens every month. Six years ago, the highest-volume user consumed roughly 100,000 tokens per month. That is a one-million-fold increase in per-customer usage in six years. Altman added that even this user is not the global record holder.
The Industry Is Responding With Caps and Cutbacks
The cost pressure is now producing concrete operational responses across the industry. Uber capped AI tool spending at $1,500 per employee per month after costs climbed beyond budget forecasts. Amazon scrapped its internal token-usage leaderboard after the rankings created pressure to consume more rather than work efficiently. Microsoft has reduced Claude Code license counts for its teams, citing increased AI tooling costs across the organization.
As additional reporting confirms, these decisions are not isolated experiments. Uber, Amazon, and Microsoft are among the largest AI tooling buyers globally. When they begin setting hard spending caps and removing vanity metrics that encouraged overconsumption, it signals that the "buy now, optimize later" phase of enterprise AI adoption is ending.
Why Token Costs Are Spiking
Three dynamics are driving the surge simultaneously.
Longer context windows. Modern frontier models support context windows from 128K to 1 million tokens. Developers who fill those windows on every API call multiply per-request costs by 10x to 100x compared to shorter prompts. Longer contexts are genuinely useful for document analysis and multi-turn agents, but they are expensive when used without discipline.
Agentic workflows running in loops. A coding agent that plans, executes, evaluates, and revises can make 10 to 30 model calls for a single task. Enterprise teams deploying agents across hundreds of engineers can generate millions of API calls per day without any single person noticing the accumulation.
No cost visibility at the point of use. Most AI tools present no per-query cost feedback. Developers using AI coding assistants or chat tools see output quality but not the token meter. Without real-time cost feedback, budgets get consumed faster than planned.
What Creators and Developers Should Do
Token costs will not drop fast enough to ignore this problem. Here are practical steps to manage AI spending without sacrificing output quality.
Measure before you cut. Enable token logging in your AI tooling to understand your actual consumption baseline. Most API platforms provide usage dashboards. You cannot optimize what you have not measured.

Match model size to task. Frontier models cost 5x to 20x more per token than smaller models. For classification, summarization, and simple Q&A, a faster and cheaper model performs at near-identical quality. Reserve large models for tasks requiring deep reasoning or long-context synthesis.
Compress your system prompts. System prompts that run to 2,000 tokens on every call can often be compressed to 400 tokens without losing essential behavior. Prompt compression tools and careful editing can cut per-call costs by 30 to 70 percent without affecting output quality.
Use prompt caching. If your workflow uses the same system prompt repeatedly, prompt caching stores the result of processing that prefix so subsequent calls reuse it at a significant discount. Anthropic and OpenAI both offer caching discounts on repeated prefix tokens. For workflows with consistent system prompts, this can reduce costs by 50 to 80 percent on the prompt portion of each call.
Set hard spending limits before deploying. API dashboards let you set hard monthly spending caps. Treat AI API costs like cloud infrastructure: establish a budget ceiling before deploying a workflow to production, not after the first unexpected bill arrives.
Key Details
- Statement date: June 2026
- Speaker: Sam Altman, CEO of OpenAI
- Key figure: Top user consumes 100 billion tokens per month (up from 100K six years ago, a one-million-fold increase)
- Industry response: Uber $1,500/employee/month cap; Amazon removed token leaderboard; Microsoft reduced AI licenses
- Context: Cost concerns went from never coming up to "a huge issue" in a single quarter in early 2026
What to Do Next
For independent creators using AI tools, audit which subscriptions are on auto-pay and whether they are generating proportional value. Tools like AI coding assistants and agentic pipelines can consume tokens invisibly. Check your usage dashboards monthly, not just when the billing email arrives. Starting with smaller models and escalating to frontier models only for specific high-value tasks is the sustainable default.
For enterprise teams, the shift from "AI is cheap enough" to "AI costs need governance" is now visible at Uber, Amazon, and Microsoft simultaneously. Building cost awareness into AI workflows in 2026 is not optional. It follows the same discipline that cloud cost management required in 2019 through 2021. Review the OpenAI API reference for detailed token pricing and usage monitoring options for your specific model tier.

Frequently Asked Questions
Is Altman's 100 billion token figure per month or per year?
Per month. Altman described OpenAI's highest-volume single user consuming 100 billion tokens every month. This is separate from OpenAI's total platform consumption across all users.
What is prompt caching and does it meaningfully reduce costs?
Prompt caching stores the processed result of a repeated prefix, such as a long system prompt, so subsequent API calls reuse it at a discounted rate. Anthropic and OpenAI both offer this. For workflows with consistent system prompts sent on every call, caching reduces costs by 50 to 80 percent on the cached portion. The savings compound significantly at scale.
Are token costs going to drop as models improve?
The trend has been toward lower per-token costs over time. The 2026 price for frontier model capability is dramatically less than equivalent capability cost in 2023. However, usage growth is outpacing price decreases at the enterprise level, and context windows keep growing. Total costs rise even as per-token prices fall unless usage discipline improves alongside capability growth.
Should independent creators worry about token costs?
At small scale, individual usage costs are usually manageable. The problem surfaces at automation scale: running agents, batch processing content, or integrating AI deeply into publishing workflows. If you run autonomous content pipelines or agentic tools daily, monitoring your token spend is worthwhile even at modest individual budgets.
What happened to OpenAI's token usage leaderboard?
The question references Amazon's internal token-usage leaderboard, which Amazon reportedly scrapped after recognizing it created pressure to consume more tokens rather than use AI efficiently. OpenAI's top user disclosure came from Altman speaking publicly about usage scale, not from a formal leaderboard product.