Runway Doubles GPU Efficiency With Kueue Scheduling

Runway published an engineering post on April 27, 2026, detailing how the company behind Gen-4 video generation and Act-One motion capture more than doubled its GPU utilization on research compute, reaching approximately twice the industry average, without adding hardware or requiring any manual capacity management from team leads.

What Happened

In the post, Runway Platform engineers Matt Kafonek and Brannon Dorsey describe a shift from ad-hoc GPU reservation to a structured queue-based scheduling system built on Kueue, an open-source Kubernetes admission controller maintained by the Kubernetes SIG scheduling group.

Before the change, Runway faced a compute utilization problem common to AI research labs: reserved capacity sat idle while teams waited for their dedicated resources, and exploratory workloads queued behind them. The result was low aggregate utilization even when some teams were significantly underusing their allocations.

The solution uses two queue types. Reserved queues give each critical team guaranteed capacity with zero borrowing allowed but full reclaim capability when workloads finish. A shared default queue borrows idle GPU capacity from any reserved queue not currently using its full allocation. When a reserved team reclaims capacity, workloads in the shared pool are preempted. Runway accepts this tradeoff because most research experiments can checkpoint and restart. Preemption can introduce 20 to 30 minute delays due to Kubernetes pod termination grace periods.

Why It Matters

Runway's creative AI products ship research features on a continuous cadence. Gen-4 video generation, Frames, and Act-One motion capture all depend on rapid iteration between research experiments and model updates. The speed at which those products reach creative professionals is directly tied to how efficiently the underlying team can run training cycles.

The outcome from Runway's published data: GPU utilization increased by more than 20 percentage points. The engineering post states the result is "approximately 2x industry norms," a meaningful gap given both the cost of GPU compute and the direct relationship between utilization rates and how many experiments a research team can run per month.

The change also eliminated the need for team leads to manually track and negotiate GPU allocations, a time-consuming coordination overhead that scales poorly as teams grow.

Key Details

Tool: Kueue (open source, Kubernetes SIG admission controller)
Result: 20-plus percentage point increase in GPU utilization; approximately 2x industry average
Reserved queues: guaranteed capacity for critical teams, zero borrowing, automatic reclaim
Default queue: opportunistic pool that borrows idle capacity across all reserved queues
Tradeoff: preemption can delay shared workloads by 20 to 30 minutes
Published by Matt Kafonek and Brannon Dorsey, Runway Platform team

What to Do Next

For teams running AI model training or research on shared Kubernetes infrastructure, the queue-borrowing pattern Runway describes is directly replicable with the same open-source tooling. The key architectural insight is that guaranteed capacity and opportunistic borrowing are not mutually exclusive when preemption is enabled.

Runway Doubles GPU Efficiency With Kueue Scheduling

What Happened

Why It Matters

Key Details

What to Do Next

Keep reading

Gemini API File Search Goes Multimodal with Image Embeddings

GPT-5.5 Instant: ChatGPT's New Default Cuts Hallucinations

Open-Slide 1.0: React Slide Framework for Claude Code Agents

What Happened

Why It Matters

Key Details

What to Do Next

Stay ahead of AI

Keep reading

Gemini API File Search Goes Multimodal with Image Embeddings

GPT-5.5 Instant: ChatGPT's New Default Cuts Hallucinations

Open-Slide 1.0: React Slide Framework for Claude Code Agents

Stay ahead of Creative AI