Sakana Fugu Matches Anthropic's Fable 5 by Orchestrating Other People's Models — Why Multi-Agent Orchestration Could Reshape the AI Tools You Pick in 2026

📅 June 22, 2026 ⏱️ 8 min read ✍️ aitrove.ai Team

📑 Table of Contents

Introduction: The Orchestrator That Doesn't Build Its Own Brain
What Sakana Fugu Actually Is
Two Models, One API: Fugu vs. Fugu Ultra
The Benchmarks: Topping Fable 5 Without a Frontier Model
The Real Play: Beating Vendor Lock-In and Export Controls
What It Means for the AI Models and Tools You Pick
The Trade-Offs You Should Weigh
The Bottom Line
Frequently Asked Questions

Introduction: The Orchestrator That Doesn't Build Its Own Brain

For three years, the AI race has been defined by a single question: who can train the biggest, smartest frontier model? On June 22, 2026, Tokyo-based Sakana AI offered a contrarian answer. It launched Sakana Fugu — a system that posts top scores against Anthropic's Fable 5 and Mythos Preview, and against GPT 5.5, without training a single frontier model of its own. Instead, Fugu orchestrates a swappable pool of other people's models, coordinating them behind one endpoint so that, from the outside, it behaves like a single super-model.

If you're choosing AI models and tools for your stack, this matters far beyond the benchmark table. Fugu is the highest-profile proof yet that orchestration — routing each task to the right model, verifying the work, and synthesizing the answer — may be catching up to raw model size as the thing that determines how good your AI actually feels. It's also an explicit hedge against a problem a lot of teams are quietly worried about: vendor lock-in and the export controls that can cut you off from a model overnight.

What Sakana Fugu Actually Is

Fugu is a multi-agent orchestration system that behaves like one model. You send a request to a single endpoint, and Fugu decides what to do with it internally. For a simple task, it just answers. For a hard one, it quietly assembles a team of expert models, delegates the work, checks it, and stitches the results together. The complexity of the multi-agent system never reaches your code.

The clever part is that Fugu is itself a language model — one trained specifically to call other LLMs in an agent pool. That pool even includes instances of Fugu itself, called recursively. It manages model selection, delegation, verification, and synthesis on its own, learning when to delegate and how agents should communicate rather than relying on hand-coded pipelines.

The research rests on two ICLR 2026 papers from Sakana: Trinity, a lightweight evolved coordinator that assigns Thinker, Worker, and Verifier roles adaptively across turns, and Conductor, which uses reinforcement learning to discover natural-language coordination strategies for diverse model pools. The takeaway: a system can learn how to assemble and route agents per task, replacing the brittle, hand-designed workflows most teams still maintain.

Two Models, One API: Fugu vs. Fugu Ultra

Fugu ships in two variants, both behind a single OpenAI-compatible API — meaning you point an existing client at your endpoint and there's no SDK migration:

Fugu balances strong performance with low latency. It's the default for everyday coding, code review, and chatbots, and it fits tools like Codex. Crucially, you can opt specific agents out of its pool, which helps teams meet data-residency, privacy, and compliance requirements.
Fugu Ultra is tuned for maximum answer quality on hard, multi-step problems. It coordinates a deeper pool of expert agents, but that pool is fixed — so per-agent opt-out isn't available. The current model ID is fugu-ultra-20260615.

Token usage and cost are reported per request, so you can monitor spend in real time as an orchestrator fans your prompt out across multiple underlying models.

The Benchmarks: Topping Fable 5 Without a Frontier Model

This is the part that got the AI community's attention. Across an 11-row comparison table, Sakana's orchestrator posted the top score on 10 of them:

Model	Where it leads
Fugu Ultra	Tops four coding benchmarks, CharXiv Reasoning, and Humanity's Last Exam
Fugu (standard)	Leads SciCode, τ³ Banking, and Long Context Reasoning; ties Fugu Ultra on GPQA-D
GPT 5.5	The only baseline win — MRCRv2 (long-context retrieval)

Notably, Sakana stresses that Anthropic's Fable 5 and Mythos Preview stand "shoulder-to-shoulder" with Fugu — and they are not even in Fugu's pool, because they aren't publicly accessible. In other words, the orchestrator hit frontier-tier results using a different roster of models than the ones it's being compared against. A beta of nearly 500 early users has leaned on it for exactly the kind of long, multi-step tasks where orchestration pays off.

The Real Play: Beating Vendor Lock-In and Export Controls

Read between the benchmark lines and Fugu's strategic intent is clear. Sakana explicitly frames it as a hedge against single-vendor dependency, citing recent export controls on Anthropic's Fable and Mythos models as motivation. If one provider restricts access or a regulator cuts off a region, an orchestrator can route around the disruption — and newer models can be folded into the pool over time without rewriting your application.

For any team that has felt the sting of a model being deprecated, gated, or geofenced, that portability argument lands hard. Your code talks to one OpenAI-compatible endpoint; what's behind it can evolve as the market does.

What It Means for the AI Models and Tools You Pick

For most buyers, the takeaway isn't "drop your current model for Fugu." It's that the decision is shifting from which single model to which layer you standardize on:

The old question	The question that matters now
Which frontier model is smartest?	Which orchestration layer routes to the best model per task?
Whose API am I locked into?	Can I swap underlying models without rewriting my app?
What's the per-token price?	What does a multi-step task actually cost end-to-end?
How good is the one model?	How good is verification, delegation, and synthesis across many?

Expect this pattern to spread. Routers, gateways, and multi-agent frameworks are racing to add the learned-coordination smarts that Fugu showcases, and the vendors that win won't necessarily be the ones with the biggest model — they'll be the ones that make a portfolio of models feel like one dependable product.

The Trade-Offs You Should Weigh

Why orchestration appeals:

Frontier-tier quality without betting on one model
OpenAI-compatible API — minimal migration
Vendor portability and resilience to export controls
Per-request cost reporting as agents fan out
New models can join the pool over time

What to watch out for:

Per-task cost can be higher when many models are invoked
Latency grows on multi-step, deep-pool jobs
Fugu Ultra's fixed pool limits compliance opt-outs
Debugging a multi-model chain is harder than one model
Quality still depends on which models are in the pool

The Bottom Line

Sakana Fugu is the clearest signal yet that the center of gravity in AI is moving from the model to the orchestrator. Matching Fable 5 and Mythos without training your own frontier model is a genuine milestone — and it reframes how you should shop in 2026. Don't just benchmark models in isolation; ask which layer can route, verify, and synthesize across them, swap them when the market shifts, and give you frontier-tier results through a single OpenAI-compatible endpoint. The teams that standardize on a smart orchestration layer, rather than a single vendor's flagship, will be the ones who weather the next export control, model deprecation, or price hike without rewriting a line of code.

Frequently Asked Questions

What is Sakana Fugu?

Sakana Fugu is a multi-agent orchestration system from Tokyo-based Sakana AI, launched on June 22, 2026. It behaves like a single model: you send a request to one endpoint, and Fugu internally decides whether to answer directly or assemble and coordinate a team of expert models to handle it. Fugu is itself a language model trained to call other LLMs in an agent pool — including itself, recursively.

How does Fugu match Anthropic's Fable 5 without a frontier model?

Fugu routes each task across a swappable pool of frontier LLMs, learning when to delegate, how agents should communicate, and how to verify and synthesize their work. It posts the top score on 10 of 11 benchmark rows tested — including coding, CharXiv Reasoning, and Humanity's Last Exam — even though Anthropic's Fable 5 and Mythos Preview are not in its pool because they aren't publicly accessible.

What's the difference between Fugu and Fugu Ultra?

Fugu balances performance with low latency for everyday coding, review, and chat, and lets you opt specific agents out of its pool for compliance. Fugu Ultra (model ID fugu-ultra-20260615) is tuned for maximum quality on hard, multi-step problems using a deeper, fixed agent pool. Both share a single OpenAI-compatible API.

Why is orchestration a hedge against vendor lock-in?

Because your code talks to one orchestration endpoint rather than a specific model, you can route around a provider that restricts access or gets hit by export controls — and fold newer models into the pool over time without rewriting your application. Sakana explicitly cited recent export controls on Anthropic's Fable and Mythos models as motivation.

Where can I compare AI models, agents, and orchestration tools?

You can browse and compare hundreds of vetted AI models, multi-agent frameworks, coding agents, and API gateways — each evaluated on capability, pricing, and ease of integration — on aitrove.ai.

Compare AI Models, Agents, and Orchestration Tools on aitrove.ai

From frontier LLMs and multi-agent frameworks to OpenAI-compatible routers, gateways, and coding agents, compare hundreds of vetted AI tools side by side — so you can build a stack that's powerful today and resilient to whatever the model market does tomorrow.

Browse All AI Tools →