Developers juggling Claude Code, Codex, and Cursor now have a single switch to control which model answers each request. Workweave has open-sourced Weave Router, a drop-in proxy that reads each prompt and routes it to the model that gives the best answer for the lowest price, all in under 50 milliseconds. The pitch for builders is simple: change one endpoint, keep your existing tools, and cut your model bill by 40 to 70 percent.

What Happened

Weave Router launched as an open-source proxy that sits between your coding agent and the model providers. It speaks the Anthropic Messages, OpenAI Chat Completions, and Gemini formats, so tools like Claude Code, Codex, and opencode connect without code changes, with Cursor support in early beta. The project homepage bills it as the number one ranked prompt router in the world, and it is free to self-host under the Elastic License v2.

Why It Matters

Most teams pay premium-model prices for prompts a cheaper model could handle just as well. A router flips that default: simple edits go to fast, inexpensive models while hard reasoning tasks still reach the frontier. That is the same idea behind multi-model orchestration platforms like Sakana's Fugu API, but Weave Router runs locally and keeps your provider keys on-device, encrypted at rest. Workweave, which describes itself as an engineering intelligence platform, is positioning the router as cost control you own rather than another middleman taking a cut.

Key Details

  • Routing decisions land in under 50 milliseconds, with a claimed 40 to 70 percent cost reduction from a single endpoint change.
  • Ranks first on the RouterArena leaderboard with an accuracy-cost score of 76.09.
  • Routes to open models including DeepSeek, Kimi, GLM, Qwen, Llama, and Mistral through OpenRouter, alongside the three major providers.
  • Provider keys stay on-device and encrypted at rest, and OTLP traces feed a local observability dashboard.
  • RouterArena itself grew out of recent published research on standardized, reproducible router evaluation.

What to Do Next

If you run several coding agents, you can try Weave Router without rewriting anything. Spin it up with npx @workweave/router or self-host it with Docker and Postgres, then point your agent's base URL at the local proxy. Start on a low-stakes project, watch the dashboard to see which prompts get downgraded to cheaper models, and confirm answer quality holds before you route production work through it. If the savings land anywhere near the claimed range, one endpoint change pays for itself fast.