FOMO Drives GPU Overbuying as 95% of Capacity Sits Idle

Companies are pumping billions into AI infrastructure that’s largely unused, according to a report released Tuesday by Cast AI, a global automation platform for cloud-native and AI workloads.

Based on data from 23,000 Kubernetes clusters, the report found that average GPU utilization across enterprise servers is just 5%. In other words, 95% of provisioned GPU capacity is not being used.

The report noted that a CPU core sitting idle costs cents per hour, while a GPU sitting idle costs dollars. For the first time since EC2 launched in 2006, GPU prices are rising, not falling. In January 2026, AWS raised H200 Capacity Block prices by 15%, citing supply and demand. The increase breaks a two-decade pricing trend.

At these prices, the hoarding instinct makes sense, the report acknowledged. Lead times are long, and releasing unrecoverable capacity feels riskier than overpaying. But at 5% utilization, the math doesn’t work, and the hoarding feeds the scarcity loop that drives prices higher.

“This was shocking to us, and shocking to our customers,” Cast AI CEO Laurent Gil told TechNewsWorld. “Almost nobody realized they were not using those machines very well.”

Fear of Being Compute-Strained

“Your ambitions have to be pretty large to overpurchase GPUs,” added Alvin Nguyen, a senior analyst for infrastructure outsourcing, data center services, and semiconductor research at Forrester Research, a multinational market research company headquartered in Cambridge, Mass.

“Unless you’re a hyperscaler, neocloud or AI startup, the chances of you having the use cases to justify the overpurchase of GPUs really isn’t there,” he told TechNewsWorld.

Dan Herbatschek, CEO and founder of Ramsey Theory Group, a technology holding and innovation firm headquartered in New York City, explained that organizations are over-indexing on capacity because they’re anticipating AI use cases that have not yet been operationalized.

“C-suites fear being compute-strained when agentic AI systems go live,” he told TechNewsWorld. “We work with enterprise organizations, and they are buying ahead of demand. But it is really difficult to justify that investment now, when most companies lack production-ready use cases.”

“The last time I saw this was with cloud,” he continued. “We’re really in an AI capacity bubble phase. Leaders are losing sight that it’s not about who has the most compute, but can you actually convert compute into ROI/business outcomes.”

Fear Fuels GPU Overcapacity

Debo Ray, founder of DevZero, a cloud infrastructure and developer productivity company in Seattle, agreed that fear is the number one reason companies are investing in AI infrastructure that remains idle.

“If you have one bad outage, teams overprovision,” he told TechNewsWorld. “If you have one missed GPU reservation, leaders panic-buy capacity. Provisioning decisions are made reactively, and nobody revisits them after the crisis passes.”

“We’ve seen clusters with 96 GPUs allocated running at 23% utilization, with 31 replicas sitting idle for 22 hours a day,” he said. “Teams get labeled negligent for this, but when there’s no feedback loop and no one watching the gap, overprovisioning is the rational call. The hoarding instinct is a direct response to scarcity anxiety, and the scarcity anxiety is partly real.”

“When capacity is genuinely hard to get back, holding onto it makes sense,” he continued. “The structural problem underneath is that the team setting resource requests isn’t the team paying the cloud bill, so the padding never gets revisited, the cluster autoscaler responds to inflated requests as if they were real demand, and waste compounds quietly.”

Idle GPUs have significant ramifications for companies. There’s cost. “Organizations are paying premium prices for economy-class utilization,” Ray explained.

There’s also a problem with what doesn’t get built. “AI infrastructure capacity that exists but sits idle isn’t just waste. It’s an opportunity cost,” Ray said. “Teams tell us they’re waiting on GPU access to run experiments. The capacity already exists inside their own clusters, but they just don’t know it.”

“There’s also a reliability paradox most people miss,” he added. “The assumption is that overprovisioning buys you safety. It often does the opposite.”

Capital Impact

Gerald Ramdeen, founder, CEO, and CTO of Luxcore, a semiconductor and optical networking company in New York City, pointed out that one of the biggest impacts of idle GPUs is poor capital efficiency and weaker returns on infrastructure spend.

“These systems depreciate quickly, while power, cooling, and data center costs continue whether the GPUs are productive or not,” he told TechNewsWorld. “It also ties up capital that could have gone into product, data, or talent.”

“More broadly,” he continued, “it distorts the market by making demand look bigger than actual utilization, which can drive more overbuilding and even more defensive buying.”

Hoarding compute can also impact the broader AI landscape. “It concentrates advantage in the hands of the largest players and makes access harder for startups, researchers, and smaller enterprises,” Ramdeen said. “That can raise prices, slow experimentation, and reduce innovation across the ecosystem.”

“It also creates a market where success depends too much on reserving supply and too little on using it efficiently,” he added. “That is not a healthy long-term structure for the industry.”

Better Management Needed

Ramdeen argued that some hoarding is rational rather than malicious. “It is a predictable response to supply uncertainty,” he said. “But long term, the winners in AI infrastructure will not be the companies that simply stockpile the most GPUs. The winners will be the ones who turn hardware into reliably available, high-utilization compute with better orchestration, better networking, and better economics.”

Lakshya Jain, director of technology at Annaly Capital Management, a mortgage-focused investment firm in New York City, maintained that underutilization is not a problem with technology.

“It is a problem with how companies are organized,” he told TechNewsWorld. “Companies are still learning how to use AI. Until they can get better at managing their AI projects, being responsible with their costs, and making sure everyone is on the same page, they will continue to buy more computer power than they need.”

“The irony of AI compute hoarding is that it undermines the very outcomes these investments are supposed to drive,” added Siddardha Vangala, co-founder and technical advisor at Tiered World Studios, a games and immersive technology company in Salt Lake City.

“Companies are making board-level bets on AI transformation while their infrastructure teams are operating with no utilization targets, no cost accountability frameworks, and no feedback loops connecting spend to production output,” he told TechNewsWorld.

“The Cast AI data isn’t surprising to anyone building real AI systems,” he said. “It’s just now becoming visible at the industry level.”