High-bandwidth memory (HBM) now accounts for 63% of the cost to build an AI chip, up from 52% just 18 months ago. That structural shift, documented by Epoch AI researcher Venkat Somala on May 21, 2026, explains why your generative AI subscriptions and API bills are not getting cheaper in 2026 and likely will not before late 2027.
What Happened
Epoch AI's new data insight tracks the component cost breakdown for AI chips designed by Nvidia, AMD, Google, and Amazon, weighted by production volume, from Q1 2024 through Q4 2025. The analysis covers four categories: HBM stacks, advanced-node logic dies (3-5nm), TSMC CoWoS advanced packaging, and substrate and power components.
The finding is straightforward: memory has taken over the cost structure of AI chips faster than the industry expected. In Q1 2024, HBM consumed roughly half the cost of a frontier AI chip. By Q4 2025, that figure had climbed to 63%. Meanwhile, advanced packaging fell from 19% to 15%, and auxiliary components dropped from 15% to 9%. The rest of the chip is getting cheaper. Memory is not.
In dollar terms, the scale of the shift is striking. Total component spending across all four chip designers grew from $22 billion in 2024 to $52 billion in 2025. HBM's share of that total jumped from $12 billion to $32 billion, an increase of $20 billion in a single year. No other chip component came close to that growth rate.
Why It Matters for Creators
Every image you generate on Runway, every track you produce on Suno, and every voice you synthesize on ElevenLabs runs on infrastructure built around HBM-equipped GPUs. When HBM costs rise, the operating costs of every AI platform rise with them.
The pressure is already visible in capital expenditure guidance. Microsoft outlined $190 billion in FY2026 capex, with roughly $25 billion of that increase attributed directly to component price inflation. Meta raised its 2026 infrastructure budget by $10 billion for similar reasons. Neither company is absorbing these increases: costs flow downstream through subscription pricing and API rate changes.
Creator-facing platforms face the same dynamic at smaller scale. The GPU clusters that power Midjourney, Suno, Runway, and ElevenLabs are all rented from or modeled on the same hyperscaler infrastructure. When AWS, Azure, and Google Cloud raise compute prices, AI platform operators either raise their own prices or accept lower margins. In a competitive market where most AI tools are not yet profitable, the margin route is not viable for most.
This also affects the surprise API bills that creators building on cloud AI APIs encounter. If your workflow involves calling Stability AI's API, OpenAI's image generation endpoint, or ElevenLabs' synthesis API at scale, the unit economics of those calls are determined by infrastructure costs that are rising, not falling.
Key Details: The HBM Cost Breakdown

| Component | Q1 2024 Share | Q4 2025 Share | Dollar Change (YoY) |
|---|---|---|---|
| HBM memory | 52% | 63% | $12B to $32B |
| Logic dies (3-5nm) | 14% | 13% | Roughly flat |
| Advanced packaging | 19% | 15% | Declining |
| Auxiliary components | 15% | 9% | Declining |
Source: Epoch AI, Venkat Somala, May 21 2026. Confidence intervals: HBM 60-67%, logic 10-16%, packaging 11-19%, auxiliary 8-10%.
HBM is a stacked memory architecture that places multiple DRAM dies directly on the same package as the processor, connected through thousands of vertical copper channels. It delivers roughly 10x the memory bandwidth of standard GDDR6 memory, which is essential for running the large matrix multiplications inside diffusion and transformer models. But producing it requires specialized wafer-on-substrate packaging that only a handful of facilities worldwide can handle. SK Hynix, Samsung, and Micron are the only commercial HBM suppliers. Every H100, H200, and B200 GPU in a commercial AI data center runs on their output.
How This Affects Specific Creative AI Tools

The impact is not uniform. The most memory-intensive tools feel it first.
Video generation is at the top of the stack. Models like Runway Gen-4 and Sora process hundreds of frames simultaneously with large spatial attention windows. The memory bandwidth requirement is substantial, which is why video generation remains the most expensive generative AI category per output minute. Pricing for these services is unlikely to fall while HBM costs are rising.
Image generation sits in the middle tier. Diffusion models like Stable Diffusion XL and Flux.1 run multi-step inference passes that stress memory bandwidth less than video but more than text. The GPU infrastructure required for image generation still runs exclusively on HBM-equipped hardware at cloud scale. Expect flat or modestly rising pricing for Midjourney, Adobe Firefly API, and similar services.
Audio models are comparatively lean. Text-to-speech and music generation models are smaller and run efficiently on less memory bandwidth. ElevenLabs and Suno are somewhat insulated from HBM cost pressure compared to video platforms, though they still operate on the same HBM-dependent GPU infrastructure.
Self-hosted tools using consumer GPUs with standard GDDR6 memory are partially outside this dynamic. Local inference on an RTX 4090 or RX 7900 XTX does not consume HBM at all. Quantized models in GGUF and GPTQ formats that fit inside 16-24 GB of VRAM are increasingly worth evaluating as a hedge against rising cloud costs.
What to Do Next
Four practical steps for managing AI tool costs in a rising-memory-cost environment:
- Lock in annual plans before Q3 pricing reviews. Most AI platforms set or review pricing in Q1 and Q3. With component costs still climbing, platforms that held pricing flat in Q1 2026 are likely to revisit rates in late summer. If you rely heavily on Runway, ElevenLabs, or Suno, annual plans lock in today's rates for 12 months.
- Move batch work to async API modes. Several platforms offer discounted batch processing for non-urgent requests. Shifting generation jobs that do not need instant results, such as batch image upscaling, bulk voice synthesis, or overnight video renders, to batch API endpoints can reduce per-call costs significantly without changing your creative output.
- Explore quantized local models for routine tasks. For high-throughput or repetitive generation work, quantized local models (4-bit and 8-bit formats) running on consumer hardware have narrowed the quality gap with cloud APIs for everyday tasks. Tools like LM Studio and ComfyUI with quantized models let you run inference at zero per-call cost after the initial hardware investment. Use cloud APIs for the highest-quality output where it matters; use local for everything else.
- Track memory cost data quarterly. Epoch AI publishes updated cost share analysis on a rolling basis at epoch.ai/data-insights. Monitoring HBM cost trajectory gives you 2-3 months of lead time before pricing changes reach creator tool subscriptions and API tiers.
The 2026 Outlook

Epoch AI projects that memory's share of AI chip costs will continue rising through 2026. Samsung, SK Hynix, and Micron are all expanding HBM production capacity, but new fab capacity takes 18-24 months to come online. Supply relief is not expected before late 2027. Until then, the economics of AI infrastructure remain tilted toward higher costs, not lower ones.
The broader implication for creators is a shift in strategy. Waiting for AI tool prices to drop is not a reliable plan for 2026. The better approach is optimizing how you consume AI tools: batching workloads, running local inference where quality permits, and locking in pricing before the next round of increases.
Frequently Asked Questions
What is HBM (high-bandwidth memory) in simple terms?
HBM is a type of computer memory where multiple memory chips are stacked on top of each other and mounted directly on the same chip package as the processor. This layout delivers far more memory bandwidth than conventional memory, which is essential for the matrix calculations that power AI models. It is made by three companies: SK Hynix, Samsung, and Micron.
Why is HBM so expensive compared to regular memory?
HBM requires advanced 3D stacking and a specialized packaging process called CoWoS (chip-on-wafer-on-substrate), available only at a small number of facilities, primarily TSMC. The limited production capacity relative to surging demand from Nvidia, AMD, Google, and Amazon keeps prices structurally high.
Will AI tool prices drop when HBM supply increases?
Not automatically or immediately. Memory costs are one component of platform operating expenses, alongside data center power, software development, and support costs. Even if HBM prices normalize in 2027-2028, platforms may maintain current pricing to improve margins rather than passing savings directly to users. Price drops in the creator tool market are more likely to come from competitive pressure between platforms than from hardware cost deflation.
Which creative AI tools are most affected by rising HBM costs?
Video generation tools (Runway, Pika, Sora) face the highest memory demands because video models process hundreds of frames with large attention windows. Image generation tools (Midjourney, Stable Diffusion API services, Adobe Firefly) are moderately affected. Audio tools (ElevenLabs, Suno, Stable Audio) are the least memory-intensive in relative terms but still operate on the same HBM-dependent GPU infrastructure.
Does this affect local AI tool costs for creators?
Consumer GPU cards use GDDR6 memory, not HBM, so the direct cost relationship is weaker. However, overall GPU demand from AI data centers keeps consumer GPU prices elevated. If you are planning a local inference setup for tools like ComfyUI or voice synthesis, current GPU prices are unlikely to fall significantly over the next 12 months. Buying to support a specific workflow today is reasonable rather than waiting for discounts that may not arrive.