Book a Free Strategy Call
Skip the read — talk to Walid in 30 min.
Free strategy call. We map your AI engineering team, you keep the notes.
Maestro vs Sakana Fugu: Open-Source vs Closed LLM Orchestration (2026)
Sakana Fugu and Maestro solve the same problem — orchestrating many LLMs behind a single endpoint — from opposite philosophies: Fugu is a closed, trained, managed orchestrator with published benchmarks, while Maestro is an open-source, transparent, self-hosted one you control. If you are searching for maestro vs sakana fugu (or sakana fugu vs maestro), the honest answer is that they are not really competing on the same axis. Pick by what you value more: a mature managed system with benchmark leadership, or full control, cost transparency, and the ability to self-host. This guide compares both fairly so you can decide.
TL;DR
- Sakana Fugu is a trained multi-agent orchestration model from Sakana AI, launched June 22, 2026. One OpenAI-compatible API routes your task across a pool of frontier LLMs. It is closed, managed, and benchmark-leading — but the per-query routing is hidden and the model pool is fixed.
- Maestro is "the open-source orchestration brain for LLMs." MIT-licensed, self-hostable, and built by AY Automate. It routes cheap-first, verifies, then escalates, and returns full cost receipts. It is honest about being a v0.1 early build, not production-hardened.
- Choose Fugu if you want a managed, trained orchestrator with published benchmarks and no infrastructure to run.
- Choose Maestro if you want an open source Sakana Fugu alternative you fully control: any model pool, visible routing, real cost receipts, and self-hosting.
- Maestro does not claim to beat a single frontier model on raw quality. Fugu does, with numbers. The trade is control and transparency vs managed maturity.
If you are new to the category, start with what LLM orchestration is and what Sakana Fugu is.
What Each One Is
Sakana Fugu
Sakana Fugu is an orchestration model, not a wrapper. Launched on June 22, 2026 by Sakana AI, it exposes a single OpenAI-compatible API. Behind that endpoint, Fugu performs selection, delegation, verification, and synthesis across a pool of frontier LLMs — internally, without you having to wire anything up.
It ships in two variants: Fugu (balanced) and Fugu Ultra (maximum quality, model id fugu-ultra-20260615). It is built on two ICLR 2026 papers — Trinity (a Thinker/Worker/Verifier structure) and Conductor (reinforcement-learned coordination) — so it is genuinely trained to orchestrate, not a naive prompt router.
Sakana's own benchmarks are strong: Fugu Ultra leads 10 of 11 evals it reports, including SWE-bench Pro at 73.7 (versus Opus 4.8 at 69.2 and GPT-5.5 at 58.6); GPT-5.5 wins on MRCRv2. For the full picture see Sakana Fugu's benchmarks vs Fable 5 and our Sakana Fugu review.
The catch is in what you give up. Fugu is closed and proprietary. The per-query routing is hidden — you cannot see or control which model actually answered. Fugu Ultra's pool is fixed, with no opt-out. Token cost is opaque, and there are no public pricing figures (subscription plus usage-based). It is not self-hostable. And Sakana claims parity with Fable 5 and Mythos — models that are not even in its pool because they are export-controlled.
Maestro
Maestro is "the open-source orchestration brain for LLMs." It is MIT-licensed and self-hostable, with the repo at github.com/walidboulanouar/maestro and a site at maestro.ayautomate.com. It is built by AY Automate (which we disclose plainly).
Its routing strategy is cheap-first → verify → escalate: try an inexpensive model, verify the answer, and only escalate to a stronger (more expensive) model when needed. Every response returns a maestro block — the receipts — showing the route decisions, per-model tokens, and total cost.
The model pool is "100% yours." You register any open, closed, or local models — OpenAI, Anthropic, OpenRouter, Vercel AI Gateway, Ollama, vLLM, llama.cpp — through a JSON registry. It is OpenAI- and Anthropic-wire compatible, so you point a base URL at localhost:8080/v1 and use it from Claude Code, Cursor, or Continue. You run it with npx openmaestro serve or Docker, no GPU, one API key.
It exposes modes: maestro-auto, maestro-fugu (a single worker plus verify), and maestro-ultra (on the roadmap).
Maestro is honest about its status: it is v0.1, an early "~5-hour build." The core works and has been tested live, but it is not production-hardened, and the learned router is not built yet — routing is currently a heuristic classifier. Its own offline benchmark shows ~92% pass at ~$0.00053 per task versus the best single model at 100% at ~$0.01507 — cheaper per success, but it does not claim to always beat a single frontier model.
Head-to-Head Comparison
| Dimension | Sakana Fugu | Maestro |
|---|---|---|
| License | Closed / proprietary | MIT, open-source |
| Self-host | No | Yes (npx openmaestro serve or Docker, no GPU) |
| Routing transparency | Hidden — you can't see which model answered | Full — maestro receipts per response |
| Model pool | Fixed (Fugu Ultra pool can't be changed) | 100% yours — any open/closed/local via JSON registry |
| Cost visibility | Opaque token cost, no public pricing | Per-model tokens + total cost on every response |
| Maturity | Launched, trained, benchmarked | v0.1 early build, not production-hardened |
| Benchmarks | Leads 10/11 of its own evals (e.g. SWE-bench Pro 73.7) | ~92% pass at ~$0.00053 (cheaper per success, not top quality) |
| Compatibility | OpenAI-compatible API | OpenAI- and Anthropic-wire compatible |
Routing Philosophy: Trained Orchestrator vs Transparent Heuristic
This is the deepest difference, and it cuts both ways.
Fugu is a trained orchestrator. The Trinity structure (Thinker/Worker/Verifier) and Conductor (RL coordination) mean the routing logic was learned — Fugu has, in a real sense, been optimized to know when to delegate, when to verify, and how to synthesize across models. That is hard engineering and it shows in the benchmarks. The cost is that the logic is a black box: you trust the model to route well, and you do not get to see or tune the decision.
Maestro takes a transparent, rule-driven approach: cheap-first, verify, escalate. There is no learned router yet — today it is a heuristic classifier, which Maestro states openly. That makes it less sophisticated than Fugu's trained coordination, but it makes the behavior legible and tunable. You can read exactly why a request escalated, and you can change the rules because the code is yours.
So the contrast is "a trained orchestrator that routes well but opaquely" versus "a transparent router you own and can reason about." Neither is strictly better; they optimize for different things.
Transparency and Control
With Fugu, the pool is fixed and the per-query routing is hidden. You send a task; an answer comes back. You cannot confirm which model produced it, cannot opt a model out of the Ultra pool, and cannot see the token-level cost breakdown. For many teams that is an acceptable trade for a managed, high-quality result.
With Maestro, every response carries the maestro block — the receipts. Route decisions, per-model token counts, and the computed cost are all returned inline. The pool is defined by your JSON registry, so you decide exactly which models are eligible, including fully local ones via Ollama, vLLM, or llama.cpp. If your requirement is auditability — knowing what ran, where, and at what cost — Maestro is built for that and Fugu is not.
Cost
Fugu's pricing is a subscription plus usage-based fees, with no public figures and opaque token costs. You will know your bill, but not necessarily the per-query economics behind it.
Maestro's cost story is the receipts. Its cheap-first strategy is designed to minimize spend: in its own offline benchmark it reached ~92% pass at ~$0.00053 per task versus ~$0.01507 for the best single model — roughly an order of magnitude cheaper per successful task, at the cost of a few percentage points of pass rate. Because you bring your own keys and can route to local models, your marginal cost can approach zero for tasks a local model handles. The honest caveat: Maestro trades a little quality for a lot of cost savings, and that trade-off is yours to tune.
Maturity and Honesty
Be clear-eyed here. Fugu is the more mature, more capable system. It launched, it is trained, and it ships published benchmarks where Fugu Ultra leads 10 of 11 of its own evals. It is a real product from a serious lab.
Maestro is a v0.1 early build — described by its authors as a "~5-hour build." The core works and has been tested live, but it is explicitly not production-hardened, and the learned router is still on the roadmap. We will not pretend otherwise: if you need a battle-tested managed orchestrator today, Fugu is further along.
What Maestro offers instead is the open, transparent, self-hosted model that Fugu structurally cannot — and a codebase you can extend yourself. The fair framing is "managed maturity and benchmark leadership" versus "open, controllable, early." For more on the broader landscape, see Sakana Fugu alternatives.
Which Should You Use?
A simple decision framework:
- You want the best managed quality with no infrastructure → Sakana Fugu. The benchmarks are real and you do not run anything.
- You need to self-host (data residency, air-gapped, compliance) → Maestro. Fugu cannot be self-hosted.
- You need cost transparency and per-query receipts → Maestro. Fugu's routing and token costs are hidden.
- You want to control the exact model pool, including local models → Maestro. Fugu Ultra's pool is fixed.
- You need something production-hardened today → Fugu. Maestro is v0.1.
- You want an open-source foundation you can fork and extend → Maestro. It is MIT-licensed.
- You want to experiment with cheap-first orchestration to cut spend → Maestro, with eyes open about its early status.
If you are evaluating Fugu specifically, read our Sakana Fugu review alongside this.
Bottom Line
Maestro vs Sakana Fugu is not a contest of one beating the other on a single metric — it is a choice of philosophy. Fugu is a closed, trained, managed orchestrator with strong published benchmarks and no infrastructure for you to run, at the cost of transparency and control. Maestro is an open-source, self-hosted, fully transparent orchestrator you own end to end, at the cost of maturity — it is an honest v0.1. If you value managed benchmarks and convenience, choose Fugu. If you value control, transparency, cost receipts, and self-hosting, Maestro is the open source Sakana Fugu alternative built for exactly that.
FAQ
Is Maestro a Sakana Fugu alternative? Yes, in philosophy. Both put many LLMs behind one endpoint and orchestrate routing. Maestro is the open source Sakana Fugu alternative: MIT-licensed, self-hostable, with visible routing and cost receipts. It is not a drop-in replacement for Fugu's trained quality — it is an early v0.1 — but it covers the same job from an open, controllable angle.
Is there an open-source Sakana Fugu? Sakana Fugu itself is closed and proprietary, so there is no open-source Fugu. Maestro is the closest open-source equivalent in spirit: it does multi-model orchestration behind an OpenAI/Anthropic-compatible API, but it is an independent project from AY Automate, not Sakana's code.
What is open source about Sakana Fugu? The orchestration research is public — Fugu builds on the ICLR 2026 Trinity and Conductor papers. The Fugu product and its weights are not open source. So "open source Sakana Fugu" really means the published ideas, not a runnable open model. If you want runnable open orchestration, use Maestro.
Is Maestro production-ready? No. Maestro is v0.1, an early "~5-hour build." The core works and has been tested live, but it is not production-hardened, and the learned router is still on the roadmap (routing is currently a heuristic classifier). Use it for experiments and self-hosted prototypes; do not assume production reliability yet.
Can I self-host Sakana Fugu?
No. Fugu is a closed, managed, API-only service from Sakana AI. There is no self-host option, and the model pool behind it is fixed. If self-hosting is a hard requirement, Maestro is the option — you run it with npx openmaestro serve or Docker, no GPU required.
Does Maestro beat Sakana Fugu on benchmarks? No, and it does not claim to. Fugu Ultra leads 10 of 11 of its own evals on raw quality. Maestro optimizes for cost per success: ~92% pass at ~$0.00053 per task versus ~$0.01507 for the best single model. Different goal — cheaper successful answers, not top-of-leaderboard quality.
Which is cheaper, Maestro or Sakana Fugu? Maestro is designed to be cheaper because of its cheap-first strategy and because you bring your own keys (including free local models). It also shows you the exact per-query cost. Fugu's pricing is subscription plus usage with no public figures and opaque token costs, so direct comparison is hard — but Maestro gives you the receipts to control spend.
Can Maestro use any model, like Fugu? Maestro's pool is "100% yours" — any open, closed, or local model via a JSON registry (OpenAI, Anthropic, OpenRouter, Vercel AI Gateway, Ollama, vLLM, llama.cpp). Fugu's Fugu Ultra pool is fixed with no opt-out. So Maestro is more flexible on the pool, while Fugu's pool is curated and trained against.
Need This in Production?
AY Automate builds and self-hosts production multi-model orchestration and failover for teams that need control over routing, cost, and data residency — Maestro is our open-source take on the problem, and we are honest that it is still early. If you want a hardened, monitored orchestration layer running on your own infrastructure, see AI agent development.
Sources
Book a Free Strategy Call
Building this in production?
Walid runs a 30-min call to map your AI engineering team. Free, no slides.

Adel keeps the engine running at AY Automate. He owns internal processes, team coordination, and the operational excellence that lets us ship fast for clients.



