11 Sources
[1]
The token bill comes due: Inside the industry scramble to manage AI's runaway costs
Across the industry, companies are starting to balk at the price of AI. Uber blew through its entire 2026 AI coding budget by April. Microsoft revoked its developers' Claude Code licenses months after enabling them. A Priceline employee told TechCrunch that a routine Cursor contract renewal came back 4-5x more expensive. Even though per-token prices have fallen, the push for more AI adoption and increasingly autonomous agents have driven token consumption higher and higher. Companies that gorged themselves in early 2025 on all-you-can-eat subscriptions are now scrambling to understand where their money is going, pull back spending, and figure out whether they can salvage some ROI from the wreckage of their budgets. Meanwhile, a market is forming to meet them there. Startups, established vendors, and a new standards body are all racing to give companies the tools and language to track what they spend. "Six months ago, I would have a conversation with a customer and it would be all about 'What can it do? Is it good enough?'" Alexander Embricos, OpenAI's head of enterprise, told TechCrunch at an event in New York City this week. "Our conversations are never about that now. Now the conversations are about, 'hey, we're spending so much. What visibility do you have? What auditability do you have? What token controls do you have? What is the efficiency of your models?'" It's against this backdrop that the Linux Foundation this week unveiled plans for the Tokenomics Foundation, a new standards body that aims to instill the same cost discipline around AI tokens that FinOps did for cloud spend. "In April and May, I started hearing from companies: 'Oh my god, we are 3x over our entire 2026 token budget and it's only April,'" J.R. Storment, executive director of the FinOps Foundation, a project under the Linux Foundation, told TechCrunch. "We started hearing existential crises, and the whole conversation shifted from tokenmaxxing and 'go fast' to 'we need guardrails, how do we control this?'" The cries heard round the tech world followed fervent demands from CEOs pushing their teams to use the best models and move fast, costs be damned. New models released in November like Anthropic's Claude Opus 4.5, OpenAI's GPT-5.1, and Google's Gemini 3 Pro brought significant improvements to agentic tools, which have multiplied consumption. It's how one company reportedly found itself with a $500 million Claude bill after forgetting to set usage limits for employees. "It's like the crack-cocaine epidemic," says Chris Reed, senior director of IT finance at Priceline, when asked about the pricing issue in using AI. "They let you try it to get you hooked on it, and now you're kind of beholden to it." Vitaly Gordon, CEO of engineering operations platform Faros AI, said he recently spoke to a CTO who told him: "One of my engineers spent $40,000 on tokens last month, and I genuinely don't know whether I should stop him or should I go and tell everyone else to be like him." A March survey by Faros found that among 20,000 developers, output was rising, but so were bugs and rewrites. Jellyfish, an engineering management platform, similarly found engineers who used the most tokens were about twice as productive than those who used AI less, but they spent 10x the number of tokens to get there. Nicholas Arcolano, head of research at Jellyfish, told TechCrunch via email that expenditure on AI is exploding in large part due to agentic features, with per-developer consumption rising about 18.6x in nine months. All in all, these stats make the productivity case murkier than the spending suggests. "Whether extreme spend pays off comes down to the ultimate business value of shipped code (e.g. revenue), which most companies still can't measure," Arcolano said. At least some of that measurement issue is the sheer scale at which AI is being used today. "Tracking cloud costs is a hundreds-of-millions-of-rows-a-month data problem," Storment said. "Tracking token costs is a trillions-of-rows-a-month data problem. You can't just stick that into whatever spreadsheet or even basic tool. You've got to fundamentally rethink your tooling, your specs and your accounting systems to do that." At Priceline, Reed is already seeing discrepancies. He noted issues between a vendor's reported usage and Priceline's internal data. "I started my career in telecom expense management, and I'm seeing all the same parallels, from telecom to cloud to AI," he said. "Anytime you introduce something new, it's ripe for billing errors and audit and optimization opportunities." A market is beginning to form around this problem. There are the pure-play companies, like Pay-i, which tracks, measures and optimizes the costs and performance of GenAI investments. Paid, meanwhile, lets developers track costs, measure usage and bill users based on actual value rather than subscription fees. Then there are companies like Jellyfish, Waydev and Faros AI, which all provide AI agent monitoring to prove the ROI of developer tools. Storment says most of the 180 vendors within the FinOps Foundation are leaning towards this space. Companies with existing distribution are also adding new features to capitalize on this new market. Ramp has recently moved into AI spend management; Datadog and New Relic have tacked on services like cloud cost management, token-level observability, and GPU monitoring. At the FinOps X conference next week, AWS is expected to introduce new financial management features geared toward enterprise AI spending. Tiffany Luck, a partner at NEA, thinks token efficiency and observability will likely be added in at the "harness or app layer." She pointed to Factory, a startup that makes AI agents for enterprises, which this week launched a model router that automatically picks the right model for every task. Gordon expects frontier labs and other model providers to adopt OpenRouter-style optimization to drive queries to the cheapest models -- a trend already showing up on enterprise Claude bills. "The financial report for how much you spend on Anthropic, even if you call the Opus model, some of the spend will be on Sonnet or Haiku, because they are smart enough to do it," Gordan said. "I think this will become more and more of a thing." But all these tools are being built without a common language or shared definitions for how much a token costs, what it produces, and how to compare spend across vendors. That's where the Tokenomics Foundation hopes to prove useful. The Foundation is building a canonical definition and framework for "tokenomics;" open standards, specifications and metrics for AI token usage and billing; as well as new metrics for AI economics, like cost-per-intelligence or tokens-per-watt. It also plans to define metrics across token factory effectiveness and consumption efficiency. The group is planning a formal launch in July, and is about to announce more members at the FinOps X conference next week. "Token economics is fundamentally more abstract and opaque than anything we've managed at this scale before," Nishant Gupta, chief availability officer at Salesforce, said in a statement. "It requires a different operational muscle than the one the industry built for cloud." That said, Goldman Sachs projects global token usage to multiply by 24 times by 2030. The companies already over budget need solutions now, and the foundation's first deliverable is still months away. "Maybe we created a steam engine, but we still haven't figured out the assembly line," said Gordon. According to Arcolano, the smart move is broad, moderate adoption. "The best ROI comes from moving the broad middle from low to moderate usage, not pushing heavy users higher," he said. Russell Brandom and Tim Fernholz contributed to this reporting.
[2]
Your AI Bill Is A Context Problem
Within four months, Uber burned through its entire 2026 AI budget. It then capped every engineer at $1,500 a month. Similarly, ServiceNow exhausted its full-year Anthropic coding budget in the first few months. Even Microsoft is winding its own engineers off Claude Code. All these examples remind me of the cloud bill shock of a decade ago, that was met with the same conviction that the cure is a tighter spend cap and a sharper rate. This time, though, I think it is the wrong cure. For sure, "tokenmaxxing" is what you get when you incentivize adoption without governing value. Uber, like Meta, literally ranked teams on a usage leaderboard to drive adoption. But capping the bill mistakes a problem of value for a problem of price. The Uber COO said it clearly: between all that Claude Code spend and anything customers can feel, "that link is not there yet." And that gap, not the bill, is the problem. Your AI bill exposes context debt Why did the token consumption numbers explode? Agentic workflows don't call a model once: they loop. Anthropic's own engineering research put a single agent at 4x the tokens of a chat interaction, and a multi-agent system at 15x and every turn of that loop re-feeds the context window. As the window fills, the signal-to-noise ratio collapses, and the quality, latency, and cost degrade together. So the bill includes both the price of intelligence the company consumes as well as the runtime tax it pays when its own knowledge isn't machine-readable. And its agents have to brute-force the missing meaning back into the window on every single call. This is context debt, billed by the token. Some of that context is visible; much of it is not. A user sees the prompt they typed, but the model call may also include platform-supplied instructions, prior interaction history, tool metadata, retrieval scaffolding, and other orchestration elements. That hidden context may be necessary, but it is also billable. So, you are not just paying for the answer, you're paying for the full assembled information scaffolding required to produce the answer in a way that is safe, reliable in a world of probabilities. And given today's prices ride on discounted compute and venture-subsidized economics, these bills will only get worse. After all, OpenAI and Anthropic are soon heading for public markets that will want those subsidies gone sooner rather than later. You cannot cap your way to business value Forrester's IT spend management framework is useful here. Visibility into the bill is table stakes, and control capabilities are necessary hygiene. But hygiene is all they are. None of it creates value. Visibility into the total is no longer enough. Enterprises need attribution: which tokens reflected user intent, which grounded the agent in business context, which supported orchestration, and which were avoidable repetition. Without that breakdown, leaders can limit spend, but they cannot improve the economics of the work. Look, we are still very early in the agentic era. When your organisation budgets tokens, you know you're not buying an off-the-shelf product with a predefined return. You're making a deliberate choice to operate at the leading edge, where value only emerges through experimentation. We're all learning how agents behave, what they cost, where they create value, and how that value compounds. And that's OK: that learning will capitalize in your AI reinvention. But cap it bluntly and you are shutting down the reinvention the spend was meant to fund. Optimization is the opposite move. It does not ask how to spend less on intelligence, but how to tie that spend to the outcomes it produces. This is where unit economics come in: you run the system until every dollar buys the most result it can. In an agentic system, what each token returns depends on the context the agent reasons over. Give it the right context, through GraphRAG or another retrieval layer, at the right moment, and the same token buys a better outcome with less waste. Give it brittle or irrelevant context, and you pay full price for confusion. This is context engineering. The build bill and the run bill Two bills hide inside your token spend. First, building agentic systems is CAPEX: that's the experimentation, the coding, the ontology, the wiring of systems into something that hopefully delivers value. It is closer to R&D than to a utility charge, and I would argue this is the vast majority of where the spend lands today. The agents are being put to work building software, few of them are running business processes yet. The artisans of that build are increasingly forward-deployed engineers (FDEs). They embed in the operational mess of workflows, of decision rights, governance, data, and systems, then translate it into working software, shaping the technical decisions until the ontology and the agents run in production. These experiments earn their tokens by showing their unit economics converging inside a window you set, and when it stops converging you kill the workload, not the budget. A robust strategic portfolio management capability is key here to keep reallocating capital to workloads that are compounding. Running those systems is the other bill: runtime inference is the recurring, per-call cost of agents once they are live, including the visible and invisible context assembled for each call. This creates a supplier-side conflict: the same platform that designs the runtime and assembles the hidden context also bills for the tokens it creates, so poor context design can quietly become a recurring platform tax the customer cannot easily see, tune, or challenge. This is pure OPEX, and that's a bill most enterprises have barely begun to see. Here optimization stops being a project and becomes a standing discipline. If the building bill already shocks you, wait untilluntil you receive the running bill as you deploy and scale your agentic workflows. To optimize that spend, I argue that a new discipline will emerge: ContextOps. ContextOps is the operating discipline The ontology an FDE constructs is a snapshot of a business that won't hold still: models capabilities evolve, business and IT processes drift, enterprise decision rights change, and the context that made an agent correct on Tuesday stops being correct by the following quarter. While building is a project motion, keeping the thing grounded as the business shifts beneath it is an operating one. ContextOps is the FinOps of the agentic era, born at the same kind of inflection: a spend and control surface that turned continuous, consequential, and ungoverned, until someone had to own it as a discipline rather than a cleanup. Of course, ContextOps governs a different object. FinOps optimizes how tokens are consumed, reaching at maturity beyond raw cost to tie consumption to business outcomes through model routing, caching, and inference economics. ContextOps governs what those tokens represent: whether the agent is still reasoning over a faithful, current picture of how the business runs. In other words, while every cost lever acts on the price of processing context, none of them check whether the context is still true. Imagine an agent that is cheap, fast, well-routed, comfortably under budget, yet approving exceptions against an org chart redrawn two quarters ago. A narrow FinOps view sees a healthy line item. The business has an agent acting, confidently, on a world that no longer exists. Only the discipline watching fidelity catches it. So, ContextOps keeps the ontology current as the business moves, optimizes what each token buys, strips out unnecessary context, and feeds every run back in so the context sharpens instead of staling. Context stops being something you build and becomes something you operate. Why ContextOps becomes a managed service One last thought: as companies increasingly use FDEs to build agentic workflows, many will hire service providers to optimize and operate them through ContextOps. The context work is continuous and outcome-focused, and it depends on proximity to business processes and workflows, to leadership and governance, to how demand shows up and decisions get made. That proximity is what makes it possible to harvest decision traces, retire context debt before it becomes a problem, adapt ontologies as the business evolves, and re-ground agents as models, including private models, evolve. But proximity is not granted once. It is earned continuously through demonstrated value. The provider has to keep proving it, and the only way to keep proving it is to improve the performance of the agentic workflows and the fidelity of the context they run on. That is the flywheel: better performance earns deeper trust, deeper trust grants closer proximity, and closer proximity makes the next improvement possible. None of this is a one-time fix. The context degrades, and so does the trust that funds access to maintain it, the moment improvement stops. That is why ContextOps will not be bought as a project. It is a long-lived operational motion, likely to emerge as a managed-services category, because the thing it governs never stops moving.
[3]
OpenAI CEO Sam Altman admits AI token costs are becoming 'a huge issue' -- company seeks improved value as overspending becomes a meme
Are companies not getting more value out of the tokens they spent on AI? OpenAI CEO Sam Altman has said in an interview that companies are now concerned about the growing costs of AI use. Speaking during the Intelligence at Work event, he said this is the first time that OpenAI's clients raised the issue and that the startup is now looking for ways to make its models more efficient. "People are really saying, you know, it's kind of a meme now, but 'My company spent my entire 2026 budget in Q1. Can you make this more efficient?'" Altman said on stage. "We are continuing to push on that more with models. I think we'll have a lot of ways we can help people get more value for less spend. But that went from, at the beginning of this year, an issue that never came up (people were totally happy with the amount they were spending) to, all of a sudden, a huge issue." There have recently been a lot of stories of companies getting massive AI bills as they experiment with "tokenmaxxing." A few company leaders believed that AI use would increase the productivity of their workers, thus increasing revenue. Nvidia CEO Jensen Huang famously said that his engineers should use AI tokens that are worth at least half their annual salary, or else he'd be "deeply alarmed." We also saw another example with OpenClaw creator Peter Steinberger, whose team spent $1.3 million on OpenAI API tokens in a month, totaling 603 billion tokens. However, it seems that this move is starting to backfire on some companies. Amazon employees admitted that they were using AI agents for unnecessary tasks just to stay on the internal AI leaderboard, while Microsoft has reportedly cut back on Claude Code licenses due to increased costs. Even the Uber CEO admitted that there is currently no link yet between going all-out on AI and delivering successful products. Despite that, Altman projects that AI token usage will continue to increase. He said that six-and-a-half years ago, the top token spender at the startup used 100,000 tokens a month -- today, that is the global per capita average token usage, and that OpenAI's token leader uses about 100 billion a month. The OpenAI chief also admitted, to his own embarrassment, that someone else uses even more. So, if token usage were to grow linearly, then he would expect the global per capita token usage to hit 100 billion monthly. However, this is likely under the assumption that token prices will decrease faster compared to the increase in the number of tokens used across the globe. Because, at the moment, some are finding that it's now more expensive to run AI models compared with hiring people. Jevons paradox says that the cheaper a particular resource becomes, the more people will use it, and we're seeing this with AI. But as agentic AI becomes more popular and sophisticated, the number of tokens these systems use has been increasing exponentially, seemingly outpacing the efficiency gains that AI labs have been making on training and inference. Follow Tom's Hardware on Google News, or add us as a preferred source, to get our latest news, analysis, & reviews in your feeds.
[4]
Big Tech Is Quietly Admitting That If It Wants to Sell People on AI, It Better Be Cheap
The AI mania that's spread across Silicon Valley like a fever over the past few years is running up against some hard economic realities. In recent weeks, big tech companies have been forced to admit that spending on tokens -- the basic unit of measurement for AI usage -- has gotten out of control. Amazon had to shut down an in-house competition to use as many tokens as possible at work, telling employees, "Please don't use AI just for the sake of using AI," according to Business Insider; Uber has reportedly capped employee spending on tokens to $1,500 per month after the company exhausted its annual AI budget earlier this year. And most tellingly, the companies building the big AI models have also woken up to this sobering reality. At a recent event hosted by OpenAI, company chief executive Sam Altman admitted that token usage had become "a huge issue" for companies that were promised big productivity gains if they incorporated AI across their organization. That's a hard pivot from just a few months ago, where the general vibe across the industry was the more that employees use AI, the better off they -- and the companies they work for -- will be. So-called "tokenmaxxing" became a meme, and more or less synonymous with "future-proofing": in a day and age when everyone and their neighbors are using AI, those who know how to use AI will have a sharp edge. Not every job will necessarily be replaced by AI (so the thinking goes), but employees who don't use AI will definitely be replaced by those who do. But AI has always been expensive, and training and inference costs for new models are only getting higher. Meanwhile, the industry's dedicated push into agents -- basically AI systems that can work with little to no human oversight for extended periods of time -- has led to a token usage explosion. One preprint study posted in April found that agents use 1,000 times as many tokens as other AI systems. It's the companies and individual users who have overwhelmingly had to eat those costs. No wonder some developers have resorted to pirating free online chatbots like Chipotle's customer service bot, Pepper, to bypass the big companies' token-hungry models. GitHub announced earlier this week that it was rolling out a new payment model in which users would be charged by the number of tokens they burn. Judging from some of the early user feedback, it hasn't been going well. Big tech desperately needs to find a new way to sell people on the future of AI without the exorbitant token costs. If they don't, companies and users will just switch to some open model they can use for free. Close to the edge Some big tech companies have literally been forced to the edge by the rising costs of AI usage. Microsoft and Google recently announced new AI products -- Gemma 4 12B and the RTX Spark laptop, respectively -- that are based on so-called "edge" computing. That's when a model is powered by computing power from a specific device, rather than by the cloud (i.e., energy-guzzling data centers). Obviously, a model of the magnitude of a Claude Opus 4.8 or a GPT-5 isn't going to be able to be run directly from your laptop; that's like trying to provide enough energy for a Falcon 9 rocket launch by plugging a stationary bike into a generator. But the logic behind Microsoft and Google's new products is that actually, not everyone needs the latest, greatest, most token-hungry models directly in the devices they're using daily. For most people most of the time, a smaller, leaner model will work just fine. And crucially, it will save everyone some money on tokens. Make no mistake, Microsoft's and Google's investments in edge computing are minuscule compared to what they're spending on data centers; cloud computing is still very much the backbone of both of their business models. But in their embrace of edge computing, we're seeing an, at least tacit, acknowledgement that the cost of massive AI models just isn't worth the squeeze that it's placing on most consumers. Water promises While they're pushing new edge computing products -- promising powerful AI capabilities at a lower cost -- Microsoft and Google are also trying to pacify a public that's become increasingly concerned over data centers' water demands. (Data centers usually use water to keep GPU clusters from overheating.) On Tuesday, during the opening keynote of Microsoft Build, the company's annual developer conference, CEO Satya Nadella claimed Microsoft's new data centers' annual water usage "is roughly equivalent to what a single restaurant would use." The following day, Google announced plans to "replenish more water than we consume" from data center cooling by 2030, along with other "water stewardship commitments." For a little extra sprinkle of intended comfort, the press release noted that "U.S. data centers use less than 1% of the water that Americans use on their lawns annually" -- though that's probably more of a damning picture of Americans' lawn-watering habits than it is an absolution of the water-guzzling sins of the AI industry.
[5]
Revenge of the AI bubble
Reckoning: Companies discovered that AI can be extraordinary when aimed precisely -- and ruinously expensive when treated as a universal productivity machine. Why it matters: The first phase doubted the technology. The second phase worshipped it. The third phase -- currently gaining steam across Corporate America -- questions whether AI's immense power is worth the price. Zoom in: The case against AI used to come from outsiders -- Luddites, "doomers," short sellers betting on a crash. Its newest skeptics are emerging from inside the boom. * Uber capped employee AI usage after burning through its annual Claude Code budget in four months. A top executive said the spending was getting "harder to justify," with no clear link between token use and more useful consumer features. * Amazon shut down an internal token leaderboard after employees gamed it with throwaway tasks to climb the rankings. An Amazon executive told staff, "Please don't use AI just for the sake of using AI." * GitHub moved Copilot, the AI coding assistant used by millions of developers, to usage-based billing as part of its effort to create a "sustainable" business. The change shocked users who were suddenly confronted with the true cost of heavy AI usage. * Bain surveyed 951 large companies and found AI savings falling well below projections, even as most firms planned to spend more. "The technology worked. The value didn't arrive," the report concluded. The intrigue: Even OpenAI CEO Sam Altman has acknowledged the new concerns, calling the question of whether AI spending will show up in revenue "the most fair criticism" of the moment. Reality check: The companies sounding the alarm are the early adopters. Most of the economy is still at the starting line, while the pioneers are the ones absorbing the cost shocks, wasted tokens and employee backlash. * AI is already creating real value for chipmakers, model labs and some power users. The harder question is whether that value spreads across the companies paying to deploy it. By the numbers: Wall Street got a fresh reminder Friday of how much AI optimism is baked into markets. * The Nasdaq plummeted 4.2%, its worst day in more than a year, while the Philadelphia Semiconductor Index plunged 10.3%, its worst day in more than six years. * One culprit was Broadcom: The chipmaker reported explosive AI growth, but failed to raise its longer-term AI revenue outlook -- disappointing investors looking for signs that demand was still accelerating. The bottom line: AI can make the right worker dramatically more productive, but those gains depend on knowing exactly where and how to apply it. The real bubble may have been the assumption that AI could be sprayed across companies, employees and workflows and reliably pay for itself.
[6]
Tokenomics - the Salesforce worldview as CMO Patrick Stokes seeks the ideal 'Goldilocks' solution to agentic pricing
Salesforce CEO Marc Benioff recently confirmed that his firm will spend $300 million on tokens with Anthropic this year. He's clearly good with that. Others have had more shocks of late when the size of their latest token charges came in, with the likes of Microsoft, Uber, and Walmart all confessing to huge bills and lining up preventative action to cut them down. Salesforce does not appear to among those shying away from ongoing token investment, as CMO Patrick Stoke alludes to when he told the Mizuho Technology Conference this week: Do you want to see our [token] leaderboards?...Honestly, it's been insane in our engineering organization, watching the pace of innovation right now is unbelievable - and there are pockets where it's even more unbelievable....Most of that spend right now is coding, but not all of it. This is what I don't think the market has totally caught up to yet. A very material amount of that spend is just knowledge workers. It's my team in Marketing, it's people in Sales, it's people like [Chief Revenue Officer] Miguel [Milano] that are using it to do their forecasting. They're not coding. They are hooking up the MCP servers and consuming tokens from Anthropic to do their knowledge work. And we think that, that's where the kind of next big wave is going to come. OK, but recent weeks have seen 'tokenomics' become an urgent agenda item for many enterprises, and not in a good way. Stokes can see there is a problem here for some: Right now, by far, the single biggest use case for consuming tokens is coding. That's the killer app. One engineer can generate like $100,000 a month bill without much problem. Now you can debate whether that's highly efficient or not, but these labs are making a lot of money on these tokens. This will play out over time, he suggests: There will probably be some sort of normalization or reckoning of that. Right now, everybody is just like, 'Everybody code', and so you hear these stories of people blah, blah, blah. There's going to be a normalization. That's not going to suit the IPO-poised LLM companies, surely? Stokes says: What the AI labs are looking for is, well, what's the next killer app? What's the next killer use case? And that's where we think Salesforce is perfectly positioned because we think it's knowledge work. We think it's people that every day are showing up and having meetings and analyzing things and making decisions. You have to access information and you want to be able to access information in a way that's low friction, and then you want to be able to connect that to the intelligence of the AI, and that's effectively what we can provide. Head-on for Headless Salesforce's latest friction-reducing move has been the introduction of Headless 360. One thing that remains unresolved here is exactly how this will be priced. Stokes offers some insights into thoughts to date, saying that there are two sides to the puzzle: One, there's actually quite a few very important technology decisions that we need to make in terms of how do we ask people that are building agents outside of our platform to identify those agents to us in the same way that they identify a user to us. That's important to solve because that provides the layer of kind of governance and licensing and permissioning that we need to put for the agent to exist and consume from Salesforce in the first place. So there's a number of technology decisions there, many of which will almost certainly result in some sort of agent user license showing up. Just like we have human licenses, we'll likely have agent licenses as well where you have to self identify the agents that you run on top of Salesforce. If you're in Salesforce at least, where we have this 25-year legacy of seat-based pricing, that sounds like the answer. OK, so we're going to charge for the agent licenses. That's possible,. But we're trying to be as thoughtful as we can on this and make sure that our own kind of legacy bias on that doesn't come in too much. We're trying to be very kind of forward in the way that we think about it. There may be alternative approaches that are a better fit, he suggests: What we're doing is we're talking to our customers and our partners, and we're going to them right now and we're saying, 'Look, blank slate, here's your contract. Imagine that you could just re-write your contract right now and have whatever unlimited usage of Headless that you want for the environment that you're trying to create. You tell us what that contract would look like'. Those are the conversations we're having with our customers right now to make sure that we can do it thoughtfully. Customers do all want to know the pricing model so they can model their future growth, he acknowledges: We want you to be able to do that as well, but we want to make sure that we don't give you something that turns out to be wrong and then we have the wrong model, so that's why we're being careful here. What's being sought here is the 'Goldilocks' porridge recipe - not too sweet, not too salty, just right. Lessons have been learned from the rollout of Agentforce, says Stokes: [That] was something that was brand new, and customers didn't know what to expect, so they wanted a consumption model. We gave them a consumption model, but it turns out that is a very complicated consumption model. It's very, very difficult to predict. It was hard enough to predict their own usage because they didn't really know how they were going to use it at the time, and then, even if they could predict their own usage, it was hard to understand the commercials of it because our consumption side was so difficult. So we made it easier. We're like, 'OK, what if you just bought the AELA (Agentic Enterprise Licensing Agreement) and then you don't worry about it?'. Use as much as you want and don't worry about it. Customers like that because it's simple, but it's also very expensive. So when buying decisions have to be made: You kind of have to pick your evil. But that's not what we want. We don't want the customer to have to pick their evil. We want to get them to something that they can really trust and believe in. To that end, there are models evolving outside of Salesforce that are of interest, says Stokes: What this comes down to is it's not really about what benefits us; we want to find a model that benefits our customer. so we're going to experiment with as many models as we can. But customers want to have clear direction of travel, surely? Stoke agrees: The downside to that is it looks like we're confusing the market, which I get. Like, how many pricing models do you have? And how do we measure this? But our approach is yes, that's a moment in time, and we kind of just need you to trust us on this. We're going to figure it out. We're going to do it with our customers, and we're going to find the right thing to do. Whatever emerges, it's clear that there are still going to be token-shaped issues that enterprises need to address. Stokes argues that whenever Salesforce brings product to market, the key is to identify what's scarce and meets a need: Right now, intelligence is not really scarce. I mean there's limits on how much energy we can provide in the world to kind of deliver the inference through the GPUs, but the thing that Salesforce delivers that scarce is context and trust. If you're building anything with AI right now, it's very, very easy to just start talking to an LLM and you get excited about how intelligent the answers sound. But then you realize it doesn't know anything about your business, so you start trying to figure out ways to dump information into it so it understands your business. But that runs into all sorts of complexity and limitations in how much you're putting into the context window. You can get about one million tokens into a context window right now, but if you put one million tokens in your context window, that's going to be an insanely expensive call. That one question is going to generate a ton of tokens and the more you put in there, the more confused your LLM will get. So you have to engineer to get the exact context that it needs in the moment, pick the right model, and then deliver an answer. And that's what Salesforce can kind of do behind the scenes in that moment. My take Of course, no discussion of 'tokenomics' would be complete without mention that Salesforce has its very own value metric that goes beyond the merely financial in the shape of Agentic Work. Units, a brainchild of Stokes. He explains: This idea of tokens in and out, that's effectively a measure of reasoning or a measure of intelligence, but it's not a measure of actual work getting done. We thought it was important for the market and the technology sector as a whole to have a way to measure actual work getting done by these pieces of intelligence. We introduced that in Q4 and I was very excited to see that continuing to grow, alongside token growth, of course. It is interesting watching the two of them grow - each of them kind of picks up pace at certain moments in time. It's also really interesting looking at the AWUs kind of across different industries and different segments, you can start to see where different usage patterns are emerging. We'll be hearing a lot more about AWUs in the months to come - and also the ongoing burden of token spend...
[7]
The AI bill is coming due. Businesses are learning tokens aren't free
FOMO is strong around AI, with companies adopting the technology willy-nilly. Many have encouraged employees to tokenmaxx to their hearts' content. Now they're starting to realize that freedom comes with a price. Just one in four companies say they have a comprehensive view of what artificial intelligence is costing them, according to an as-yet-unreleased KPMG survey reported by The Wall Street Journal. Only about half have even some visibility into the cost of their AI use. One in five have no visibility, or only see the damage once the bill arrives. "It's a new resource that needs to be managed that didn't exist quite that way, and we're seeing exponential growth," Steve Chase, KPMG's global head of AI, told the Journal. Part of the problem is pinning down what, exactly, AI costs. The basic unit of AI use -- the token -- is an unusual thing to budget around. Each token is a fragment of text, code, or data processed by a model when it reads a prompt or produces an answer, but it doesn't map neatly onto a single word. Some tokens can be cached by AI models, meaning they are not charged again, while others must be processed as new. The result is uncertainty that may not become clear until the bill lands at the end of the month. Multiply that across individual employees at a company, and it is little wonder that chief financial officers are being left with eye-watering bills. KPMG is working with companies that have blown through annual token and cloud-computing budgets in a matter of months, Chase told the Journal, while one client has seen token usage rise sixfold. Axios reported last month that one AI consultant's client spent half a billion dollars in a single month after failing to put usage limits on employees' Claude licenses.
[8]
Corporate America Is Experiencing AI Sticker Shock
For a couple of years, the main problem AI firms had was convincing people -- or, more specifically, people in charge of companies -- to pay for their products. Artificial-intelligence applications were gaining popularity, but they were expensive and not yet easy to deploy in clearly and measurably productive ways. Lots of companies wanted to use AI, insofar as they understood it as a theoretical source of efficiency and a tech megatrend that they didn't want to miss, but early AI pilots and half-baked enterprise tools didn't make a strong case for spending more. In late 2025, though, the case became much stronger: AI coding improved massively as tools like Claude Code and OpenAI's Codex went from coding assistance tools to code-writing tools. This elevated so-called vibe-coding from a fascinating curiosity into a plausible way of working. Here, for companies developing and maintaining software, was a clear and actionable use case, a place to spend real money with the expectation of specific returns, and a way to satisfy a strategic need to be doing something -- anything! -- not to fall behind. A few months later, some companies have a different problem with their AI deployments: Now they're spending too much. This has all been good news for Anthropic, which became the standard-bearer for AI coding, surpassed OpenAI's valuation, and reportedly logged its first profitable quarter this year, far ahead of projections. If the trend continues, the company will ride enterprise AI-coding adoption to a massive IPO this year. (The enormous surge in spending with the frontier AI companies has also quieted talk of a bubble, which AI executives themselves were fretting about as recently as December.) But this period of rapid adoption, for all its surface-level obviousness -- you can just talk through software development now, and the code writes itself -- led to some strange behaviors in the broader tech industry. Top-down AI usage mandates became common practice, and companies including Meta and Amazon created internal leaderboards to rank and incentivize AI usage. This resulted in some well-publicized episodes of "tokenmaxxing," where employees at these companies blew through billions of tokens -- the basic unit of information that a model receives and generates -- to unclear ends, throwing AI agents at pointless tasks, using the most expensive models to do simple work, and defaulting to AI for work where other tools might suffice. (Why check your weather app when you can send Claude to check for you? Better yet: Why not whip up a whole new weather app?) This was a brief and manic phase that both Amazon and Meta distanced themselves from quickly. ("Please don't use AI just for the sake of using AI," an Amazon executive told employees last month.) But a less cartoonish version of the top-down deployment model has become common practice: Companies like Salesforce set "minimum" and "ideal" AI usage targets, while countless other firms, large and small, are demanding proof from employees that they were at least doing something with these new tools. It would be fair to say policies like this haven't been popular among employees. Most of the angst comes from the association of AI deployment with job cuts, a prospect that executives are so excited about that some of them have been doing it in advance and at scale. But it would also be fair to say that top-down policies like this have worked in that they've habituated a significant portion of the code-writing workforce to working with generative AI. Now that these companies are getting what they wanted in that sense -- a part of their workforce testing AI tools to see if it can become more productive, reduce its size, or both -- they're back to making more straightforward calculations about whether all this investment is worth it. After reports that Uber blew through its allotted AI budget for the year before the end of Q1, the company is instituting limits on AI usage for its employees, according to Bloomberg. According to The Wall Street Journal, they're not alone: Use of artificial intelligence by big companies is exploding -- and the soaring cost has some of them pumping the brakes in a way that could complicate AI's triumphal march across the economy... Executives across industries this year have urged employees to integrate AI tools into their work, spending freely to encourage experimentation and seeking to send a message to Wall Street that their companies won't be left behind in a coming wave of disruption. It's tempting to overstate the swing here. Uber is a tech company, of course, and one that deals regularly with plenty of software-shaped problems that could plausibly be made easier by code-generating AI. But it's also a mature company that makes money from a platform that's already been built, and which contains millions of drivers and riders -- its biggest bottleneck for growth, in other words, probably isn't that it can't produce software quickly enough. It also makes sense that companies with a general interest in AI as a labor-saving tool would underestimate just how costly going all-in on new AI tools actually is: Compared to 2024, for example, the amount of compute used by the top-end models from Anthropic and OpenAI is astronomically higher in 2026. Concerns about cost are "all of a sudden a huge issue," Sam Altman says, but why wouldn't they be? AI firms finally found a product corporate America wanted to buy. It just happened to be much more expensive than corporate America expected. It also turned out to be more expensive than AI firms expected, which has led to some adjustments in pricing. The days of unlimited (and heavily subsidized) personal and enterprise accounts are ending, giving way to more tightly defined usage tiers and metered billing. Last month, Microsoft warned customers that change was coming. "Today, a quick chat question and a multi-hour autonomous coding session can cost the user the same amount," the company said. "GitHub has absorbed much of the escalating inference cost behind that usage, but the current premium request model is no longer sustainable. Usage-based billing fixes that." The change happened at the beginning of June, which trickled down to actual workers -- stuck a few caverns below the discursive object of "AI" in the Platonic caverns of corporate America -- in some fairly strange ways. Reports of usage limits and new policies flooded programmer sub-Reddits: 4 days into june and we used like 75% of copilot credits for the department... It's going to be a manual coding month. I work at a big telecom. They're cutting back and placing strict budgets of like $30 per developer unless it's really needed for your workflow. My usage costs have more than tripled for no reason and my usage is the same as last month. I work at fang+ and was told this week we have a budget of 1500/month. "The whiplash is pretty crazy from the guidance we were receiving just a month or two ago," wrote one tech employee. Another agreed: I went through 70% of the monthly usage yesterday before I realized it. It's hilarious because about 2 months ago our leadership decided to go all in on AI. Borderline AI psychosis - devs should be using AI every day, don't write tests by hand when you can use AI, AI code reviews everywhere, AI, AI, AI. These companies -- both the AI firms and their customers -- are clearly still in a discovery phase: AI firms are figuring out what their debt and customers will force or allow them to charge; their customers are trying to nail down what these tools, now that they're at least partially deployed, are actually worth to them. For now, despite the headlines about spending caps, these are mostly stories about companies spending a lot on AI with near-term intentions to spend more. But the next few months could tell a different tale, particularly as far cheaper models -- which have tended to run a few months to a year behind the frontier, depending on who you ask -- start to pass through the good-at-coding threshold that triggered this surge in adoption. Remember DeepSeek, the vastly cheaper Chinese AI model that triggered a temporary tech sector sell-off in 2025? According to fintech company Ramp, which tracks AI spending, corporate customers are giving it a second look, as well as "using open source models," which they can run themselves, "in a shift away from OpenAI and Anthropic." Meanwhile, popular AI coding company Cursor has been marketing a version of Kimi, from Chinese company Moonshot, as "frontier-level at coding" at a small fraction of the cost. It won't be long, in other words, until the latest new, dazzling, and expensive use case for AI -- writing decent code -- is thoroughly commoditized, alongside the growing list of other tasks that once sat at the frontier of AI capability. (Asking a chatbot to do research on the web, to choose a mundane example of a function users take for granted today, was limited to paid ChatGPT users just last year.) After finally succeeding in their push for corporate adoption, and getting a taste of real revenue, AI firms could be looking at a different sort of competition: a good old-fashioned price war.
[9]
Tokenomics - the PegaSystems worldview as LLM providers come under fire from CEO Alan Trefler
I can't tell you how thrilled I am now that in the last six or seven weeks, this term 'tokenomics' has come out because in the early 'free drugs' days, which were, say, February, you never heard that. A colorful, if accurate analogy from Alan Trefler, CEO of PegaSystems, on the burning topic of the day - the high price of tokens. Or perhaps, given all the anecdotes about enterprises flinching from being presented with the bill, the unexpectedly high price of tokens. Perhaps they ought to have known better? Eyeing up "all these tokens that the LLMs are giving us for free", Trefler admits: I will confess, we were perhaps a little suspicious. We got more suspicious when they suddenly announced that they were going to build data centers for hundreds of millions of dollars - and then it was billions of dollars and with hundreds of billions of dollars. As a classicist might put it: Timeō Danaōs et dōna ferentēs*. (*Most commonly - beware of Greeks bearing gifts.) Or as Trefler puts it in more up-to-date vernacular: They're building technology to drive consumption of tokens. We shouldn't put ourselves at the mercy of having them drive that, unless there's some real added value. Not only is there no added value having them drive the consumption of tokens by doing reasoning at run time, there's a huge negative in which you can't describe to people how things are going to work until after they happen. Another way For its part, Pegasystems has taken a different approach, he pitches: We went hard down the notion of burn the LLMs tokens like crazy at design time, really exercise that. For everything you design, you're going to run it, hopefully, thousands, tens thousands, hundreds of thousands of times. When you're running it, only use the LLM for those very narrow tasks where you need to do a translation or something needs to happen. And by the way, in a lot of those cases, you can use a lot cheaper LLM because if all you're trying to do is pass parse some language, you don't need a $1.7 trillion parameter model! The big counter-move from Pegasystems lies in its Infinity 26 token-free agentic AI offering, launched at the PegaWorld conference this week. As per the official blah blah: Clients can now design, build, and run their agentic workflows across Pega Infinity 26 without paying per token. The Pega Predictable AI architecture shifts the heavy AI reasoning to design time, so runtime agents are fast, reliable, and dramatically cheaper to run. This directly addresses two of the most pressing obstacles for enterprises trying to scale their use of AI agents: escalating token costs and unreliable outcomes. Trefler picks up the story: In Infinity '26, we're not going to charge for tokens. It's not because we're using so many tokens, but are going to underwrite it, which, by the way, is what the models did. This style of use treats tokens as if they are something that should be treated with respect, which they should because the other consequence of tokens is burning forests [with] the incredible amount of electricity it takes to run a Graphical Processor Unit to come to a conclusion. Complexities That's all sound thinking, but given how turbulent a ride the AI hype cycle has taken enterprises on over the past few years, is the emergence of 'tokenomics' just a fresh complication to have to factor into strategic thinking? Trefler reckons that customers are trying to figure all this out: The customers are hearing amazing and wondrous things. They're trying to figure out what's right. The level of skepticism and suspicion has massively risen in the last month. This conversation that we're having, I will tell you, is resonating with the senior executives that I was talking with. I was talking with the CIO of a Fortune 50 company who has a travel ban and wasn't sending anyone to PegaWorld...She said, 'We've been talking about this a lot. We're not sure what's going on. And you guys obviously know something that we want to learn more about'. So, I think we have an opportunity to do some education here, but the cacophony out there is huge, and it's a very confusing moment. The emerging lexicon isn't helping he suggests. The past few weeks has seen OpenAI and Anthropic both release Reasoning Tokens, internal tokens used by models to 'think' through complex tasks before producing a response. Do users understand what a Reasoning Token is or what they're paying for here - because despite some sleight of hand, they are still paying for them, says Trefler: Claude and OpenAI both added these new models. First, they claimed they were dropping their token prices. They dropped some token prices on some old stuff, but they added new models that don't have dropped token prices. And these models, they're going through a thought process. What they're doing there is generally called reasoning. But there's a catch here: You used to think you know how many tokens you were using. You used to think that you'd know that if you asked a question or put something in or uploaded a document, you might use 100 input tokens, and then the thing would think and it would give you maybe 400 output tokens. Now it turns out the input tokens are 1/3 the price of the output tokens. But, all right, I got 400 tokens, I can calculate the price, etc. I kind of know what's involved. Anyway, you get these input tokens, you get these output tokens - do you think that's the tokens you're using? No, no, no. Your LLM, when it's doing that thinking is generating tokens for itself. If it actually decides, as many of these do, that it wants to split the problem up and create what's called the sub-agent to go off and research a little bit of something, it uses tokens to go talk to the sub-agent and then it pulls those tokens back together to try to figure out what the conclusion is. It's not atypical for the total number of tokens that you get charged on to be five to 10x the number of tokens in your input and output - and you don't see them to your bill. He's seen this at first hand it seems: Two weeks ago, I asked Claude, 'Can you tell me how many tokens we just used?'. [The response was] 'I don't have that record. You'll be able to see that on your billing'. This is, I think, indicative of why when these new technologies come out, good organizations, wise organizations always ask, 'How can this operate against us?' I'm so glad we did when we were making the design decisions about how and when and where to use LLMs because they're not done yet. They have to raise a lot more money for them to get the IPO valuations they want. As for the 'SaaSpocalypse' coming for software firms, the theory so beloved by the silver-bullet chasers on Wall Street, Trefler has no time for this: I cannot over-state the complete amount of confusion that has been created by the LLM companies basically who we use and love, right? But declaring war on software and trying to say that the entire TAM (Total Addressable Market] of the software industry is something they're just going to wipe out - I don't think they're wiping it out ever, certainly not anytime soon. There are parts that are vulnerable, there are elements of software that inevitably when big inventions happen, have to change. But the ability to manage work in a predictable way by using intelligence and then leveraging the intelligence isn't something that I see going away. In fact, I see that as something that's going to be required much, much more. That's what our bet is. My take Trefler tells it like it is: The whole fact this industry called them tokens, I think it is just an example of the BS that's going on here because there's a mathematical relationship between tokens and words, and they could have called them words, but that would have been more transparent. So tokens, that's obviously a lot more mythical here as well.
[10]
The agentic AI flywheel is coming for your budget - Celonis Field CTO on token economics
The token cost conversation in enterprise AI has been building since the start of this year. My colleague Jon Reed has pointed out the problem of "tokenmaxxing"- the tendency to throw frontier model compute at every problem without asking whether it's warranted. There is a school of thought that as AI tooling becomes more of a burden for the CFO and cost functions in the enterprise more broadly, lighter weight models and more deterministic systems will become important alongside the frontier models. Equally, digital leaders have been comparing AI cost management to the cloud FinOps problem, with some reporting that organizations are already rationing daily token credits to employees. I also wrote in January that the question of who actually pays for agentic AI - and from which budget - was going to be a limiting concern for buyers. All of this is to say that whilst many are coming around to the idea that AI is going to be utilized broadly across the enterprise, there is ongoing concern about who pays for what, how to manage the rising costs, and more importantly, how to tie those costs to perceived benefits. I spoke recently with Manuel Haug, Field CTO at Celonis - who focuses on strategic direction and works closely with what he describes as lighthouse customers on extracting value from the latest AI technologies - to understand what enterprises are actually experiencing as agentic AI moves from pilot to production. The flywheel Haug frames the current environment across customers in three different stages. The first stage is one many will recognize: a person goes to an AI system and asks it something, via a copilot or chatbot type tool (summarization, drafting, etc). The second - and where he says many Celonis customers are operating now - is human in the loop: an AI proposes actions embedded in a business process, but the human remains the decision-maker. The third stage, which he says some customers are actively trying to reach, he calls "human on the loop" - agents operating with near-autonomy across a defined scope, with humans orchestrating rather than participating in each individual decision. The shift from stage two to stage three is where the cost curve changes significantly. On what he is seeing from customer conversations, Haug says: There's an almost exponential curve in the usage uptake of these agents. The more you let them operate independently, and the more power users you have with multiple agents working for them, the more the cost multiplies - not a gentle, steady climb, but a genuine jump from very little cost to a lot in a short period of time. He describes a rapid flywheel, where one agent becomes three, three becomes ten, sometimes within days. The same dynamic driving agentic AI's productivity case (although some of this is debatable at present too) is what is driving the budget concern. The wrong model Part of the problem, Haug argues, is that buyers are bringing the wrong framework to the economics. Enterprise software has been priced on seats for decades - predictable, flat-rate, easy to budget. Of course vendors have woken up to the risks associated with the 'SaaSpocalypse' and are adjusting their charging models to be a mix of seat-based and consumption-based pricing - but for enterprises, the risks exist too. Haug says: AI vendors started with a SaaS model: seat-based, flat-rate. But it then switched to token-based pricing, at least for heavy usage, and that's much closer to infrastructure pricing. Think about how you'd price a service on AWS - infrastructure components priced on consumption, on compute resources. Token utilization is much closer to compute resource than to a seat. The conflict this creates is that buyers expecting seat-like predictability are encountering a variable compute cost that scales with usage. And unlike cloud infrastructure - where IT has established governance frameworks - token spend is often arriving outside of IT's line of sight entirely. This connects to something I raised in January, where I argued that nobody in the enterprise clearly owns the agentic AI budget. Haug confirms the pattern from his customer conversations and says that business teams are moving faster than corporate IT: Many AI solutions originate on the business side: a procurement team or supply chain team sees tremendous benefit from implementing an agent and just does it, almost shadow IT-style. Corporate IT is then catching up, providing horizontal layers and governance. That ownership disconnect is, he suggests, a significant reason why costs arrive as a surprise. It isn't that the spend is inherently unmanageable - it's that nobody has been assigned to manage it until the bill arrives. Compounding this is the model tiering question that most enterprises haven't yet thought through properly. Haug notes that most initial AI projects default to frontier models - the most powerful and most expensive options - to demonstrate value quickly. The optimization, moving to lighter and cheaper models as use cases are proven, happens later, and only in more mature deployments. For many organizations, it hasn't happened at all yet. Tying spend to value The harder problem is what to do about it and Haug is sceptical of approaches that treat token cost as a pure IT efficiency question. What he describes from Celonis customers is a practice he calls "agent mining" - treating agents as stakeholders in the business process and interrogating them with the same rigour applied to any other capital investment: We treat agents as a stakeholder in the process and monitor them the same way we'd monitor whether any other investment in the company makes sense. Does transitioning to this new technology actually produce the right outcomes? We look at the agent's logs, we see which decisions they're involved in, where they're most active in the business process, and we correlate that to outcomes. The outcomes he points to are process-level - for example, on-time delivery rates or customer satisfaction. I asked whether enterprise buyers are developing anything like a revenue-per-token metric at enterprise level. He says: What I haven't seen is an enterprise-level revenue-over-token metric applied across the whole organization. It's more domain-specific at this point. Simply put, the value case for AI spend exists, and domain-level measurement is happening. But the enterprise-wide view that would make the CFO conversation straightforward is not yet there. The architecture argument For Haug - and Celonis - the durable answer to the cost problem is architectural rather than operational. The Celonis Context Model (CCM), which the company launched in May alongside the acquisition of MIT-founded AI decision intelligence company Ikigai Labs, is central to this. Rather than requiring each new agent to reason from raw data - a compute-intensive process that consumes context window space expensively - the CCM provides a shared backbone that agents query via tool calls: Instead of dumping all the data into the LLM's context window and having it reason on raw data to figure out, say, how to calculate on-time delivery rate, that question becomes a tool call to the Context Model. This does make theoretical sense, as the cost economics compound over time. Each subsequent agent built on the shared foundation costs incrementally less, because the context work is already done. The build process is, in Haug's words, almost industrialized - agents retrieve information through standardized mechanisms rather than reinventing from scratch each time. He also points to context engineers as being a key role in enterprise AI architecture over the next few years. It is the maturer evolution of the prompt engineering discipline that appeared two or three years ago, now applied at the level of enterprise systems: What I foresee happening in enterprise architecture is that there will be a group of people whose job is context engineering - building something like a Context Model to enable multiple agents across the enterprise. His closing advice to buyers is that if you manage tokens as an IT cost in isolation, you risk cutting the agents generating the most value. Buyers should be thinking constantly about how to keep the link to business outcomes: What's critical is being absolutely focused on attaching that to value. Is it driving on-time delivery?... Is it driving customer satisfaction? That's what matters. My take The architecture argument Haug makes for a shared context layer makes sense - if you have the right context and data foundation in place, this should reduce the cost of output as agents have to do less work each time. We are hearing similar arguments across the enterprise AI stack - from process intelligence to data platforms to the major cloud providers. The question is whether most enterprises will get there before budget pressure forces a less considered response. As we saw with the adoption of cloud computing many years ago, the shadow IT cost - of department leads spending on their credit cards - took time and strategic planning to get under control. I think it will be some time before we really understand how to not only manage these costs for agentic AI, but also explicitly tie these costs to value creation - only then do I think we will fully be able to drive agentic AI at scale. Otherwise, the cost burden seems too much and projects will be abandoned. The business case requires serious thinking through and whilst it is tempting to run off and deploy the latest AI tooling - the cost question should be embedded in deployments from the start.
[11]
Rising token costs are no laughing matter. OpenAI's Sam Altman isn't smiling...
The soaring cost of tokens is causing concern among AI implementators, admits a serious looking OpenAI CEO Sam Altman, but he doesn't understand why? In an chat hosted by OpenAI's Chief Revenue Officer Denise Dresser, Altman said that the second biggest theme he hears about from customers right now is cost: People are really saying, - it's kind of become a meme now - that, 'My company spent my entire 2026 budget in Q1 to kind of make this more efficient.'..That has become that way from the beginning of this year. [Before that] it was an issue that never came up. [It went from] people were totally happy with the amount they were spending to all of a sudden a huge issue. But we have seen plenty of evidence of token resistance kicking in of late. As we noted earlier in the week, recent times have seen US supermarket giant Walmart reportedly capping staff usage of an AI agent called Code Puppy, Uber reporting that it burned through its annual $3.5 billion AI budget in four months, and even Microsoft pulling Claude Code access for around 100,000 engineers after finding the costs impossible to justify. CEO Sundar Pichai, took time out during his address to the Google I/O developer conference to acknowledge the problem: We've heard that many companies are already blowing through their annual token budget; it's only May! To be fair to Altman, he does recognize that people are using a lot more tokens, but this elicits the comment that: If you all buy a ton of tokens from us, we're very happy. And what he says is couched as a, presumably, amusing anecdote, although he doesn't seem to smile when he tells it: To give people a sense of just how big the magnitude of the challenge in front of us is, six-and-a-half years ago, which was the earliest I could find data for, the token leader at OpenAI used about 100,000 tokens a month. It was probably very likely the token leader in the world today, six-and-a-half years later, that is about the per capita average in the world. [Today] the token leader at OpenAI uses about 100 billion tokens a month. To my embarrassment, that is not the token leader in the world - we found someone that uses even more! Hah hah, er, hah! (Still no smile...) More to come The OpenAI CEO plows on with a warning: It's still a one million times increase in six-and-a-half years. If the same trend happens again, where we project forward six-and-a-half years, and that becomes the global average of tokens per capita, or AI use per capita...You can start to wrap your head around the infrastructure challenge in front of us, and what it will take to be ready for that moment. The trouble is that the world is not set up for everyone who wants to use AI en masse to be able to do so. That's a challenge for the next phase of the revolution, suggests Altman: We have to build infrastructure to let people, companies, scientists, everyone use AI at the scale it deserved from that. whatever they want, whatever they need, whatever they like to do with it, and have this just be something that we think of as a resource that is seeped into everything. This last point is familiar Altman territory. Earlier this year at an appearance at BlackRock in Washington, he told his audience: We see a future where intelligence is a utility, like electricity or water, and people buy it from us on a meter. As an analogy it begs so many questions, not least what happens if you can't afford to fill the meter? Sam cuts you off from intelligence? From knowledge? From information? What are the societal implications of that idea taken to extremis - or indeed the political ones? How soon before there's a two-tier world of knowledge haves and knowledge have-nots based on whether you can pay the bill? But the controversy his remarks sparked back then either passed him by or he just doesn't accept there's any issue here as he returns to the topic undeterred, with a remark that some might think suggests that Altman's connection with people's day-to-day real lives has been broken: You probably don't think about the price of electricity or water that much. You pay a bill for them, but you just know they're there. Try telling that to someone who just got their latest electricity bill, someone who hasn't made billions out of running a loss-making AI start-up that shows no signs of turning a penny profit for years to come, despite a ludicrous theoretical market valuation. The brown envelope from the water company may not fill anyone in the Altman household with dread, but for those out there in the real world it's a different matter. But Altman's plans for OpenAI will carry on regardless it seems. This is a guy who happily says: If you all buy a ton of tokens from us, we're very happy. Now he predicts that in the next phase: We really have to become this hugely expensive piece of infrastructure that the world will use for all this stuff. Will the cost of that will be taken on board by AI providers and not passed on to customers? Sam? Sam!?! Needless to say, he's still not smiling. My take I'm not entirely convinced that Altman's comments are necessarily a terribly good pre-IPO conversation to engage with, but that's his problem. Then again, I really also just don't think he gets what the token cost problem really means for so many. Maybe you could argue that ultimately that will be his problem as well, but as with so much of what he says, I fall back on my personal reading that he seems to views so much of the world as an academic exercise or a theoretical landscape, detached from signs of human empathy. To be fair, he does say: We want you all to be able to use AI and never worry about it being great and affordable, and there being a lot of it, and we got to go do that. OK, but will that happen? And how? Will the likes of OpenAI rise to the challenge and live up to the claims of CRO Dresser that: Our ambition is to bring intelligence to every human being in the world for good. We are committed to this. But at what price? Something needs to change here. The build-out costs of the scale that Altman alludes to aren't sustainable to produce profitability to the hyperscalers or indeed to end users, who aren't just balking at the size of the token bills they're being presented with, but with the lack of associated value that they perceive they're getting from paying out such amounts. It's that value association recognition that Salesforce is trying to address with its proprietary Agentic Work Unit (AWU) idea. This is a metric devised by Salesforce CMO Patrick Stokes which is pitched as measuring the actual work performed by AI agents across the Salesforce platform. While conventional token-centered AI metrics focus on the raw data processed by a model, the claim for AWUs is that they will quantify the end-to-end tasks completed, and as such will provide a clearer picture of AI's actual business ROI. Co-incidentally as Altman was making his latest remarks, Salesforce co-founder Parker Harris addressed this issue at the Evercore Global TMT Conference in San Francisco where he said of the tokens metric: This is how much we've paid model providers and helps with their S1[regulatory filing]. It's kind of like leaderboards for vibe coding - are you token maxing and who's using the most tokens to write code? People say, 'Well, that's a terrible metric because people are just going to try to use the most tokens' and it's not really a right metric for output. AWU is definitely the right metric. Convincing the rest of the industry to buy into that Salesforce metric isn't a stated objective to date - there's no push to make it a de facto standard, let alone a de jure one - but how receptive people on the buy side are to a value-centered metric that they can believe in will be (a) interesting to observe and (b) illustrative to other vendors as an approach to emulate. In the meantime, you just excuse me, I need to go and argue with my electricity provider about my latest quarterly bill...and that's nothing to smile about I can assure you!
Share
Copy Link
Major tech companies are slamming the brakes on AI spending after burning through entire annual budgets in just months. Uber capped engineers at $1,500 monthly after exhausting its 2026 AI budget by April, while Microsoft revoked Claude Code licenses. Even OpenAI CEO Sam Altman admits AI costs have become 'a huge issue' as agentic workflows drive token consumption to unsustainable levels.
The AI adoption frenzy has collided with stark economic reality. Uber exhausted its entire 2026 AI coding budget by April, forcing the company to cap every engineer at $1,500 per month
1
. Microsoft revoked its developers' Claude Code licenses months after enabling them, while a Priceline employee reported a routine Cursor contract renewal returning 4-5x more expensive1
. ServiceNow similarly burned through its full-year Anthropic coding budget within the first few months2
. These runaway AI costs represent a dramatic shift from early 2025, when companies gorged themselves on all-you-can-eat subscriptions without understanding the financial consequences.
Source: Fast Company
The crisis extends beyond individual companies. J.R. Storment, executive director of the FinOps Foundation, told TechCrunch he started hearing from companies in April and May reporting they were 3x over their entire 2026 token budget
1
. One company reportedly faced a $500 million Claude bill after forgetting to set usage limits for employees1
. The cost of AI adoption has transformed from a minor concern into what Chris Reed, senior director of IT finance at Priceline, describes as "the crack-cocaine epidemic" where companies got hooked on subsidized pricing and now find themselves dependent1
.OpenAI CEO Sam Altman publicly admitted that AI costs have become "a huge issue" for the first time during the Intelligence at Work event
3
. "People are really saying, you know, it's kind of a meme now, but 'My company spent my entire 2026 budget in Q1. Can you make this more efficient?'" Altman said on stage3
. This marks a complete reversal from the beginning of the year when clients were "totally happy with the amount they were spending"3
.
Source: Tom's Hardware
Alexander Embricos, OpenAI's head of enterprise, confirmed the shift in customer conversations. "Six months ago, I would have a conversation with a customer and it would be all about 'What can it do? Is it good enough?'" he told TechCrunch. "Our conversations are never about that now. Now the conversations are about, 'hey, we're spending so much. What visibility do you have? What auditability do you have? What token controls do you have? What is the efficiency of your models?'"
1
. Altman also acknowledged that the question of whether AI spending will show up in revenue is "the most fair criticism" of the moment5
.The explosion in AI token usage stems primarily from agentic workflows that loop repeatedly rather than making single model calls. Anthropic's engineering research found a single AI agent consumes 4x the tokens of a chat interaction, while multi-agent systems use 15x
2
. Each loop refills the context window, creating what experts call context debt—the runtime tax companies pay when their knowledge isn't machine-readable2
. Nicholas Arcolano, head of research at Jellyfish, reported that per-developer consumption rose approximately 18.6x in nine months due to agentic features1
.One preprint study found that AI agents use 1,000 times as many tokens as other AI systems
4
. New models released in November, including Anthropic's Claude Opus 4.5, OpenAI's GPT-5.1, and Google's Gemini 3 Pro, brought significant improvements to agentic tools that multiplied consumption1
. This exemplifies Jevons paradox: as AI tokens become cheaper per unit, total usage increases so dramatically that overall spending rises3
.Related Stories
Managing AI expenditures has become critical as companies struggle to demonstrate ROI. A March survey by Faros AI found that among 20,000 developers, output was rising, but so were bugs and rewrites
1
. Jellyfish discovered engineers who used the most tokens were about twice as productive as those who used AI less, but they spent 10x the number of tokens to achieve those gains1
. These statistics make the productivity case far murkier than AI spending suggests.
Source: TechCrunch
Vitaly Gordon, CEO of Faros AI, recounted speaking with a CTO who said: "One of my engineers spent $40,000 on tokens last month, and I genuinely don't know whether I should stop him or should I go and tell everyone else to be like him"
1
. Uber's COO admitted that between all the Claude Code spending and anything customers can feel, "that link is not there yet"2
. Bain surveyed 951 large companies and found AI savings falling well below projections, concluding: "The technology worked. The value didn't arrive"5
.A market is forming to address the crisis in optimizing AI expenditures. The Linux Foundation unveiled the Tokenomics Foundation, a new standards body aiming to instill the same cost discipline around AI token usage that FinOps brought to cloud spend
1
. Storment explained that tracking cloud costs is a hundreds-of-millions-of-rows-a-month data problem, while tracking token costs is a trillions-of-rows-a-month data problem requiring fundamentally rethought tooling, specs, and accounting systems1
.Companies like Pay-i are emerging to track, measure, and optimize the costs and performance of GenAI investments, while Paid lets developers track costs and bill users based on actual value rather than subscription fees
1
. GitHub announced usage-based billing for Copilot, shocking users confronted with the true inference costs of heavy AI usage5
. Microsoft and Google recently announced edge computing products—Gemma 4 12B and the RTX Spark laptop—that run smaller models directly on devices rather than through energy-intensive data centers, tacitly acknowledging that massive large language models aren't worth the cost for most daily tasks4
.Amazon shut down an internal token leaderboard after employees gamed it with throwaway tasks, telling staff: "Please don't use AI just for the sake of using AI"
4
. The Nasdaq plummeted 4.2% in its worst day in over a year when Broadcom failed to raise its longer-term AI revenue outlook, reminding Wall Street how much optimism is baked into markets5
. As companies confront the reality that AI must be applied precisely rather than universally, the industry faces a critical question: whether AI's immense power justifies its price tag.Summarized by
Navi
[2]
[3]
[5]
17 Jun 2026•Business and Economy

26 May 2026•Business and Economy

20 Mar 2026•Business and Economy

1
Policy and Regulation

2
Policy and Regulation

3
Policy and Regulation
