Sakana Fugu: One API to Orchestrate Top AI Models

Sakana AI shipped Sakana Fugu on June 22, 2026, and it inverts the usual model launch. Instead of one bigger model, Fugu is a multi-agent system that behaves like a single model: you send a request to one OpenAI-compatible endpoint, and Fugu decides which frontier models to call, how to split the work, and how to stitch the answers back together. For anyone who builds with AI, whether you are wiring a website, an app, a content pipeline, or an agent, this is the orchestration layer arriving as a product you can call in one line.

The headline claim: Sakana's top tier, Fugu Ultra, "stands shoulder-to-shoulder with leading models like Fable 5 and Mythos Preview" on rigorous engineering and reasoning benchmarks, without those models even being in its pool. Here is what Fugu actually is, what it costs, how it scores, and why a tool that looks like developer infrastructure belongs in every builder's toolkit.

What Sakana Fugu Is

A central hub routing connections to a pool of smaller nodes — Fugu presents one endpoint while coordinating a pool of models behind it.

Fugu is what Sakana calls a "Multi-Agent System as a Model." It is itself a language model, but its job is not to answer your prompt directly. It is trained to call other LLMs in an agent pool, including instances of itself recursively, and to manage model selection, delegation, verification, and synthesis internally. The complexity of running a multi-agent system never reaches your code. You get one answer from one endpoint.

There are two tiers. Plain Fugu balances strong performance with low latency, pitched as the default for everyday work like coding, code review, and chatbots. Fugu Ultra coordinates a deeper pool of expert agents and is tuned for maximum answer quality on hard, multi-step problems. Both are reached through the same OpenAI-compatible API, so any client built for OpenAI can point at Sakana's endpoint with a config change.

How It Works: TRINITY and the Conductor

A conductor's baton over three blocks representing coordinated roles — A small conductor model assigns roles and coordinates a pool of larger models.

Fugu is the productized version of two Sakana research papers accepted at ICLR 2026. The first, TRINITY, uses a lightweight evolved coordinator that orchestrates multiple LLMs over several turns, assigning each one a Thinker, Worker, or Verifier role and adapting the delegation to the task. The second, the Conductor, is trained with reinforcement learning to discover natural-language coordination strategies: rather than engineers hand-designing the prompts and routing, the system learns how to talk to its agents.

The detail that makes Fugu interesting is the size of the brain doing the conducting. Per the Conductor paper, the orchestrator is roughly a 7-billion-parameter model, and a 7B Conductor "achieves significant performance gains beyond any individual worker." A small, cheap model learns to direct a pool of expensive frontier models and, in Sakana's results, gets more out of them than any one of them delivers alone.

The Benchmarks: How Fugu Stacks Up

Sakana publishes a head-to-head table on the Fugu product page, comparing Fugu and Fugu Ultra against Opus 4.8, Gemini 3.1 Pro, and GPT-5.5. These are vendor-reported numbers, not independently verified, so read them as Sakana's claims. The pattern is consistent: the orchestrated system leads or ties on most of the coding and reasoning suites.

Benchmark	Fugu	Fugu Ultra	Opus 4.8	Gemini 3.1 Pro	GPT-5.5
SWE-Bench Pro	59.0	73.7	69.2	54.2	58.6
TerminalBench 2.1	80.2	82.1	74.6	70.3	78.2
LiveCodeBench	92.9	93.2	87.8	88.5	85.3
Humanity's Last Exam	47.2	50.0	49.8	44.4	41.4
GPQA Diamond	95.5	95.5	92.0	94.3	93.6

There is a twist Sakana is quietly proud of. Fable 5 and Mythos Preview, the models Fugu claims parity with, are not in its agent pool, because they are not publicly accessible. Fugu reaches that tier by orchestrating a pool of available models like Opus 4.8, Gemini 3.1 Pro, and GPT-5.5. The pitch is that coordination, not raw model access, closes the gap.

What It Costs

Fugu has two pricing modes. Subscriptions run at three tiers: Standard at 20 dollars a month for lightweight daily use, Pro at 100 dollars a month for roughly ten times the usage, and Max at 200 dollars a month for about twenty times. All three include both Fugu and Fugu Ultra. Sakana is also running a launch promotion: subscribe before the end of July 2026 and you get a free second month.

For heavier and programmatic work, Fugu Ultra has pay-as-you-go rates of 5 dollars per million input tokens, 30 dollars per million output tokens, and 0.50 dollars per million cached input tokens. One caveat worth stating precisely, because secondary write-ups have garbled it: pricing does not simply "double" past a long context. Above 272,000 tokens of context, input rises to 10 dollars and cached input to 1 dollar per million (a 2x bump), while output rises to 45 dollars per million (a 1.5x bump). Long-context jobs cost more, but not uniformly twice as much.

Why This Matters for Creators and Builders

Charcoal building blocks assembling with one orange block snapping in — Orchestration is the layer underneath anything a builder makes with AI.

It is tempting to file Fugu under "developer infrastructure" and move on. That is the wrong read. Every person building something with AI right now, a landing page, a short film, a game prototype, an automation, a research brief, is already doing manual orchestration: trying a prompt in one model, pasting the result into another, switching tools when one stalls. Fugu turns that manual model-shopping into a single call. You stop picking the model and start describing the outcome.

That changes the day-to-day for builders in a concrete way. A creator who codes a site no longer has to decide whether today's task belongs to a coding specialist, a reasoning model, or a fast generalist. The orchestrator makes that call per request and verifies its own work before answering. For multi-step creative pipelines, the same property means fewer dropped handoffs between tools, because the handoffs happen inside one endpoint instead of across a creator's browser tabs.

How to Try Fugu This Week

Getting started takes minutes if you already use any OpenAI-compatible client.

1. Sign in. Create an account at console.sakana.ai. There is no waitlist for the public release.

2. Point your existing client at Sakana. Because the API is OpenAI-compatible, you change the base URL and the model name (for example, fugu-ultra-20260615) in whatever OpenAI SDK or tool you already run. No rewrite required.

3. For coding, use the CLI. Sakana ships a command-line coding agent. The public GitHub repo installs it with a one-line script and runs it as the codex-fugu command, so you can drive Fugu from a terminal the way you would any coding agent.

4. Start on a real task. Code review is the clearest early win Sakana highlights, since the Verifier role is built into the orchestration. Hand it a pull request or a messy function and compare what it flags against your usual single-model tool.

The Catch

Three things to keep in mind before you commit a workflow to it. First, Fugu is not available in the EU or EEA at launch while Sakana works toward GDPR compliance, so European builders are blocked for now. Second, the benchmark numbers are Sakana's own, run in June 2026, and have not been independently reproduced. Third, orchestration has a cost shape worth watching: a single Fugu request can fan out into several frontier-model calls under the hood, so a "one" request is not always one model's worth of latency or spend. For high-volume, latency-sensitive loops, measure before you assume it is cheaper than calling a single model directly.

Frequently Asked Questions

What is Sakana Fugu?

Sakana Fugu is a multi-agent orchestration system delivered as a single model. It exposes one OpenAI-compatible API, and behind that endpoint it routes each request across a pool of frontier LLMs, handling model selection, delegation, verification, and synthesis for you. It became generally available on June 22, 2026.

How is Fugu different from a normal LLM API?

A normal API sends your prompt to one model. Fugu is itself a model trained to coordinate other models: for each request it builds a plan, assigns roles, calls the right agents (including itself), checks the result, and returns one answer. You describe the outcome instead of choosing a model.

Is Sakana Fugu open source?

The command-line client and a technical report are public on the SakanaAI/fugu GitHub repo, but Fugu itself runs as a hosted service through Sakana's API. The orchestrator is not released as open weights. The underlying research, the TRINITY and Conductor papers, is published on arXiv.

How much does Sakana Fugu cost?

Subscriptions are 20, 100, and 200 dollars a month (Standard, Pro, Max), all including both models. Fugu Ultra pay-as-you-go is 5 dollars per million input tokens and 30 dollars per million output tokens, rising to 10 and 45 dollars respectively above a 272,000-token context.

Can I use Fugu in the EU?

Not yet. Fugu is available from outside Japan but does not serve users in EU or EEA member states at launch, pending GDPR and EU-specific compliance work.

What are TRINITY and the Conductor?

They are the two ICLR 2026 papers Fugu is built on. TRINITY is an evolved coordinator that assigns Thinker, Worker, and Verifier roles to a pool of models. The Conductor is a roughly 7-billion-parameter model trained with reinforcement learning to discover, in natural language, how to coordinate those agents.

Related deep dives

Fugu is the clearest sign yet that the next competitive frontier is not a bigger model but a smarter conductor. If a 7-billion-parameter orchestrator can squeeze frontier-level results out of a pool of existing models, the advantage shifts from who trains the largest network to who routes the best. For builders, that is a future where you stop shopping for models and start shopping for outcomes, and the tool that picks the model for you becomes as fundamental as the editor you write in.