6 Sources
[1]
Anthropic's Fable Five ban exposed AI's next big problem -- but Sakana's Fugu may have the answer
When Anthropic's Fable 5 disappeared just days after launch it proved relying on a single AI isn't sustainable For the past few years, the AI industry has been obsessed with building bigger, smarter and more capable models. Now, a new release from Sakana AI suggests the next AI arms race may not be about building the best model at all, but rather building the best system for managing multiple models. As Anthropic's Fable 5 and Mythos models generate intense discussion across the industry about performance, capabilities and access, Sakana AI has introduced a new approach with Fugu. The timing of this release is especially interesting as users are increasingly stacking models and utilizing several AI agents at once. What is Sakana Fugu? Unlike ChatGPT, Claude or Gemini, Fugu is not trying to be the smartest model in the room. Instead, it acts more like an AI project manager. When a user submits a task, Fugu analyzes the request, decides which AI models are best suited for different parts of the problem, routes work to those models, evaluates the responses and combines the results into a final answer. It's similar to a manager assembling a team of specialists instead of relying on a single employee. Because we all know one model isn't good at everything. Instead, one model might be better at coding, while another might excel at reasoning or writing. Simply put, Fugu's job is to determine who should do what and then stitch everything together. According to Sakana AI's website, this orchestration approach allows the system to achieve performance comparable to leading frontier models without depending entirely on a single model provider. Why this matters Most people think of AI competition as a race to build the biggest and most powerful model, but Fugu points toward a different possibility. Instead of models attempting to outperform each other individually, what if the future belongs to systems that know how to combine multiple models effectively. And while this concept is anything but new, what makes Sakana stand out is it has trained the orchestration process itself and made the routing intelligence the centerpiece of the product. In other words, it made the coordinator as important as the workers. The lesson from Fable The conversation surrounding Anthropic's Fable models highlighted something many organizations are beginning to recognize and that's relying on a single AI provider can create challenges. When access changes, outages occur, pricing shifts or capabilities evolve, entire workflows can be affected overnight. Systems like Fugu are designed to reduce that dependency. Rather than building around one model, they build around an ecosystem of models. So now, if one model becomes unavailable, another can potentially take its place. If a better model emerges tomorrow, it can theoretically be added to the mix. That flexibility could become increasingly valuable as the AI landscape grows more competitive. The takeaway Don't get me wrong: model size, benchmark scores and raw capability still matter. But Sakana's Fugu hints at a future where the most important question isn't "Which model is best?" but rather "Which system is best at choosing the right model?" Fugu suggests the next phase of competition may look very different. Instead of creating a single AI that does everything, the winners may be the companies that can assemble, coordinate and optimize entire teams of AIs behind the scenes. If that's the direction the industry is heading, the next breakthrough might be an AI smart enough to know when not to answer the question itself. Follow Tom's Guide on Google News and add us as a preferred source to get our up-to-date news, analysis, and reviews in your feeds. Subscribe to Tom's Guide on YouTube and follow us on TikTok. Finally, you can visit our dedicated Tom's Guide Savings Squad hub for expert help on getting the best products for less.
[2]
No Claude Fable 5? No problem: Sakana achieves frontier performance with new Fugu multi-model, auto synthesis system
Last night, the increasingly enterprise-focused AI startup Sakana launched Fugu, a multi-agent orchestration system that delivers frontier-level AI performance through a single, OpenAI-compatible API. Designed for developers, enterprises, and nations seeking resilience against vendor lock-in and geopolitical export controls, Fugu (Japanese for "pufferfish"), bypasses the traditional monolithic model structure by dynamically routing queries to a swappable pool of specialized AI agents. Sakana CEO and co-founder David Ha, formerly of Google Brain, positioned Fugu as a more reliable option for enterprise workflows than any single AI model provider in the wake of Anthropic's move on June 12 to revoke public access to its most powerful models, Claude Mythos 5 and Claude Fable 5, in the wake of a U.S. government export control order. As Ha wrote in a post today on X: "Fugu dynamically orchestrates the world's best models to tackle complex tasks. We are proving that a well-orchestrated pool of swappable agents can match restricted frontier models like Fable and Mythos. But Fugu is about more than just performance. I believe that Orchestration Models are the next frontier, beyond bigger models. Relying on a single company's model for national infrastructure is a massive risk. As recent export controls have shown, access to top models can disappear overnight. Collective intelligence is the practical hedge against this concentration of power. Fugu simply routes around vendor restrictions by relying on an entirely swappable agent pool." By acting as a sophisticated coordinator rather than a standalone foundation model, Fugu matches the output quality of top-tier models like Fable and Mythos on third-party benchmarks of agentic tasks, while fundamentally altering how developers deploy critical AI infrastructure. How Sakana Fugu works and where it beats Anthropic's Claude Fable 5 At its core, Sakana Fugu operates like a master general contractor. When presented with a complex request, Fugu does not attempt to execute every step itself. Instead, it breaks the problem down, delegates sub-tasks to a pool of expert foundation models, verifies their work, and synthesizes the final output. "Fugu is itself an LLM, trained to call various LLMs in an agent pool, including instances of itself recursively," the Sakana AI team noted in their technical release. Grounded in two of Sakana's 2026 research papers, TRINITY and the Conductor, the system autonomously manages the entire lifecycle of model selection and verification using learned coordination strategies rather than hand-designed workflows. To the end user, this multi-agent swarm is entirely abstracted behind a standard API endpoint. Sakana AI is offering two variants of the system to cater to different operational workloads: * Fugu: A high-speed, low-latency model optimized for everyday tasks. It is designed to act as the default engine for interactive chatbots and integrates directly into coding environments like Codex. * Fugu Ultra: The flagship tier engineered for complex, high-stakes tasks such as AI research, cybersecurity analysis, and multi-step patent investigations. According to Sakana, Fugu Ultra coordinates a deeper pool of experts and matches industry-leading monolithic models across rigorous scientific and reasoning benchmarks. Additionally, on the pay-as-you-go plan, standard Fugu charges a dynamic rate based on the specific underlying models activated, whereas Fugu Ultra utilizes a fixed pricing structure starting at $5 per million input tokens and $30 per million output tokens. As indicated by benchmark charts shared by Sakana, Fugu actually exceeds the performance of Anthropic's Claude Fable 5 on LiveCodeBench, an open source benchmark testing coding performance on regularly refreshed, software problem-solving tasks (Fugu Ultra: 93.2, Fugu: 92.9, Fable: 89.8), and beats the prior Claude Mythos Preview model on GPQA-D (Diamond) , a test of 198 graduate-level multiple-choice questions in biology, physics, and chemistry (Fugu Ultra: 95.5, Fugu: 95.5, Mythos Preview: 94.6). By orchestrating multiple models from different providers, Fugu essentially builds native redundancy into the AI stack. If one provider suffers an outage or faces sudden regulatory restrictions, Fugu routes around the disruption to maintain uptime. Licensing and availability Fugu is offered as a commercial, proprietary API service, not an open-source framework. Because Sakana's core intellectual property lies in its non-obvious collaboration patterns, the specific routing information -- meaning exactly which underlying models Fugu selects for a given query -- remains proprietary and is intentionally hidden from the user. However, Sakana offers critical controls for enterprise data compliance. Developers can explicitly opt specific models or providers out of their Fugu routing pool to maintain strict corporate privacy standards. Additionally, users can opt out of having their prompts used for future training data. Geographically, Fugu is restricted from operating within the European Union (EU) and European Economic Area (EEA) while Sakana works to align its black-box data routing architecture with GDPR regulations. Pricing is fairly steep Fugu is available immediately in most regions -- with the temporary exception of the EU and EEA -- at subscription tiers and pay-as-you-go pricing. Teams can opt for monthly subscription allowances designed for individual or hands-on use: a Standard tier at $20/month for lightweight workflows, a Pro tier at $100/month providing 10x standard usage, and a Max tier at $200/month offering 20x usage for continuous, long-running tasks. I wasn't able to find the actual amount of tokens covered under these plans, but I've reached out to Ha on X for more information. As part of the initial rollout, Sakana is offering a free second month for users who subscribe to any tier by July 31, 2026. For enterprise scaling and production deployments, Sakana offers an elastic pay-as-you-go plan. Crucially for high-stakes environments, requests made under this consumption-based model are served at a higher priority than those from monthly subscription plans. Under this framework, the standard Fugu engine charges the single rate of the highest-tier underlying model involved in a query, without ever stacking multi-agent fees. The flagship Fugu Ultra tier (fugu-ultra-20260615) utilizes a fixed pricing structure per one million tokens: $5 for input, $30 for output, and $0.50 for cached input. These rates increase to $10, $45, and $1.00 respectively for extreme workloads utilizing context windows above 272K tokens. That puts it among the more expensive options compared to single AI models via provider APIs: Developers modeling operational costs should also note a significant architectural caveat in how Fugu bills for its multi-agent capabilities. According to the developer documentation, Fugu Ultra's API responses include detailed usage fields that separate user-visible token generation from internal orchestration work. The background tokens consumed and generated when Fugu delegates sub-tasks, verifies code, or routes between underlying agents are not absorbed by the provider; they represent real token usage and are counted toward the final price of the request at standard rates. The Orchestration landscape: Fugu vs. The Field and notable benchmark performance To understand Fugu's position in the mid-2026 AI ecosystem, it is critical to distinguish between model routing and multi-agent orchestration. Over the past year, enterprise adoption of standard routing platforms -- such as Not Diamond, Martian, and the open-source RouteLLM framework -- has skyrocketed. These systems act as intelligent air traffic controllers; using semantic classifiers or meta-models, they analyze an incoming prompt and predict which single foundation model will yield the highest quality or most cost-effective response, dispatching the query accordingly. Fugu operates on a fundamentally different paradigm. Rather than making a one-shot routing decision, Fugu aligns more closely with complex multi-round systems like Router-R1 (a framework introduced at NeurIPS 2025). It breaks a query down, interleaves reasoning with delegation, and dynamically assigns sub-tasks to multiple models in parallel or sequence before synthesizing a final output. While frameworks like LangGraph, CrewAI, and Microsoft AutoGen offer developers the tools to build similar multi-agent systems, they require immense manual configuration -- defining roles, setting up conditional edges, and managing state across long-running loops. Fugu abstracts this operational overhead entirely. It is essentially a LangGraph-style workflow packaged as a single, black-box API endpoint. An orchestration system is ultimately bounded by the raw capabilities of the underlying models in its pool, a reality reflected in Sakana's own benchmark testing against standalone frontier models. On rigorous coding and agentic tasks, collective intelligence shows a distinct advantage over standard models. Fugu Ultra posted a 73.7 on SWE-Bench Pro, significantly outperforming Anthropic's Claude Opus 4.8 (69.2) and OpenAI's GPT-5.5 (58.6). However, Fugu is not a silver bullet, and its performance is not a clean sweep across the board. When compared to highly specialized or restricted-access monolithic models, Fugu occasionally trails: * SWE-Bench Pro: While Fugu Ultra (73.7) beat most accessible models, it was comfortably eclipsed by Anthropic's limited-access Fable 5 (80.0), which is currently absent from Fugu's swappable pool due to the U.S. government's export control order and Anthropic's subsequent response to remove the model entirely from global usage. * Humanity's Last Exam: Fugu Ultra (50.0) narrowly edged out Opus 4.8 (49.8), but again fell short of Fable 5 (53.3). * Long-Context and Security: On the MRCRv2 long-context-recall test, OpenAI's GPT-5.5 maintained the lead (94.8 vs Fugu Ultra's 93.6), and Opus 4.8 remained the top performer on the CTI-REALM cybersecurity benchmark (69.6 vs Fugu Ultra's 69.4). The quantitative data points to a clear conclusion: Fugu is highly effective at boosting performance on messy, multi-step tasks (like writing a complex HTML5 game from scratch) by leaning on the combined strengths of multiple mid-tier and high-tier models. However, for sheer brute-force reasoning within a single, highly constrained domain, the industry's largest standalone models still hold the edge -- provided an enterprise can maintain uninterrupted access to them. Background on Sakana's formation and noteworthy achievements to date Sakana AI was formed in Tokyo in 2023 by Llion Jones, a co-author of Google's foundational 2017 "Attention Is All You Need" paper, and David Ha, the former head of research at Stability AI. Disillusioned by large tech company bureaucracy and the industry's hyper-fixation on scaling single, massive foundational models, the founders built Sakana around principles of biomimicry and evolutionary computing. The company's name, derived from the Japanese word for fish, reflects its core technical thesis: utilizing collective "swarm" intelligence rather than brute-force compute. Following a $2.6 billion Series B valuation in late 2025 and the recent June 2026 launch of Marlin -- an autonomous, eight-hour research agent for the B2B sector -- Fugu represents the commercialization of Sakana's multi-agent routing technology for everyday developers. A mixed reception among the broader AI community online The developer community has responded to Fugu by rigorously testing its practical tradeoffs, weighing its routing efficiencies against the sheer power of monolithic foundation models. AI observer, developer and influencer Chris (@ChrissGPT on X) highlighted the specific utility of Fugu over raw foundational AI. "For a single clean prompt, you probably would [use Fable 5, Mythos, or GPT-5.5 directly]," he noted, but argued that Fugu's true value emerges in messy, multi-step environments. "...whether it involves delegation, verification, synthesis, code review, research loops, security analysis... the more it would make sense to use this," he wrote. Chris also pointed out the strategic geopolitical advantage of Fugu's architecture, noting that if frontier AI access is abruptly revoked due to regulation or export controls, an orchestrator can dynamically swap models to prevent a total system failure. Creative agency owner Mark Santos (@markksantos) of Mark Studios provided a direct, real-world comparison by tasking both Fugu Ultra and Claude Opus 4.8 with building a "Crossy Road" game clone using Three.js. The results underscored the operational differences between an orchestrator and a monolithic giant: * Sakana Fugu Ultra: Completed the task in 22 minutes using ~89,000 tokens for roughly $7.32. However, the final game suffered from minor logic errors, such as inverted directional turns and wonky camera angles. * Claude Opus 4.8: Took 79 minutes, burned ~940,000 tokens for nearly $37.85, and got stuck in a retry loop requiring human intervention. Despite the inefficiency, it ultimately produced superior application design and functionality. Santos concluded the experiment by stating, "In terms of application functionality, quality, and design, Opus won. In terms of model speed and performance, Fugu... won". Elie Bakouch, a research engineer at cloud-based, open AI infrastructure and systems provider Prime Intellect, pointed out on X that "to be clear, this is a closed source orchestrator on top of closed source models. if before you didn't control the models, now you don't even control which ones are used or how much. this is not 'AI sovereignty'..." These early tests and reactions mirror the sentiment summarized by Reddit user GreedyWorking1499 in initial platform discussions: "Until proven otherwise, this is just a highly advanced router/wrapper, not a fundamental not a fundamental leap in intelligence like Mythos/Fable was." Yet, as enterprises increasingly demand fail-safes against single-vendor reliance, Sakana is proving that packaging collective intelligence into a single API endpoint is a highly viable commercial path.
[3]
Is Sakana Fugu really better than Anthropic's Fable 5 and Mythos preview?
Japanese artificial intelligence (AI) startup Sakana AI has launched its multiagent orchestration system Sakana Fugu, claiming it outperforms American AI startup Anthropic's popular frontier models, Fable 5 and Mythos Preview. Here is all you need to know about the AI system. It is a multi-agent AI system that presents itself as a single model. Instead of relying on a fixed workflow, it dynamically selects, coordinates, and orchestrates specialised AI models for each task, optimising performance across coding, reasoning, research, and other complex workloads. This means that instead of being a single traditional standalone large language model (LLM), Fugu acts as a conductor or a smart project manager that evaluates your prompt, breaks it down, and coordinates a hidden pool of frontier AI models (including GPT, Claude Opus, and Gemini) to solve the task. Who are the people behind Sakana Fugu? The startup, founded in 2023, is led by David Ha and Llion Jones, who was a co-author of the famous foundational Google paper, 'Attention Is All You Need' (2017). Tokyo-based Sakana AI closed a ÂĄ20 billion (approximately $135 million) Series B funding round at a post-money valuation of $2.65 billion last November, according to a TechCrunch report. How does the agentic AI system work? Per the startup's website, the system is built on two research papers -- TRINITY and Conductor -- presented at the International Conference on Learning Representations (ICLR) 2026 edition, one of the world's top machine learning conferences. Fugu learns how to assemble expert agents, assign roles, and coordinate collaboration patterns rather than following predefined structures. Users access the system through a single OpenAI-compatible application programming interface (API), a standardised way of accessing different models without changing coding language, while Fugu handles model routing, switching, and orchestration in the background. The platform offers two versions. Fugu is designed for everyday use, balancing performance and latency (the delay in sending and receiving a response) for coding, code review, chatbots, and research tasks. Fugu Ultra prioritises answer quality, coordinating a larger pool of agents for demanding workloads such as paper reproduction, cybersecurity analysis and patent research. How does it fare against competitors? Per Sakana AI, Fugu Ultra matches or exceeds leading frontier models on several coding, reasoning, scientific, and agentic benchmarks, including SWE Bench Pro, LiveCodeBench, GPQA-D, and Humanity's Last Exam. The company argues that orchestrating multiple strong models can outperform any individual model on complex tasks. However, Sakana Fugu is not a frontier foundation model in the same sense as Claude Opus, GPT 5.5 or Gemini 3.1 Pro. Instead, it coordinates a pool of underlying AI models. While users interact with Fugu as a single model, its performance is dependent on the models in its agent pool. Hence, while it outperforms other foundational LLMs, the startup explicitly stated in its blogpost that users cannot see which underlying models were used to process their queries, because model selection and routing are proprietary. That strongly suggests Fugu's capabilities depend, at least in part, on access to other models rather than solely on a new foundation model they trained themselves. How is it priced? The model is available through both subscriptions and pay-as-you-go usage. Subscription plans range from $20 to $200 per month, while Fugu Ultra API pricing starts at $5 per million input tokens and $30 per million output tokens. Users can configure agent participation in standard Fugu for privacy and compliance needs, while Fugu Ultra uses a fixed agent pool.
[4]
Fugu Ultra Claims to Beat Mythos and Fable in Standardized Benchmarks
Sakana Lab's latest innovation, Fugu Ultra, has sparked significant discussion in the AI community. Unlike traditional standalone models, Fugu Ultra operates as an orchestrator, intelligently routing tasks to specialized AI systems based on their strengths. This design aims to enhance efficiency and scalability, particularly for organizations managing diverse AI workloads. Universe of AI explores whether Fugu Ultra can truly rival established models like Mythos and Fable, especially in areas such as task delegation and adaptive performance. Early demonstrations suggest promising results in domains like 3D rendering, but questions remain about its reliability across broader applications. Gain insight into the specific strengths and limitations of Fugu Ultra, including its unique routing algorithms and how they compare to traditional AI workflows. Explore how its pricing structure and regulatory challenges impact accessibility, particularly for multinational organizations. You'll also learn about the two distinct versions of Fugu, Regular and Ultra, and their suitability for different use cases, from small businesses to advanced research. This guide provides a balanced breakdown of what Fugu Ultra offers and the hurdles it must overcome to establish itself as a viable alternative in the competitive AI landscape. How Fugu Works: The Art of Model Orchestration At its core, Fugu acts as a "general contractor" for AI tasks, intelligently assigning specific jobs to the most suitable models. Using advanced routing algorithms, Fugu eliminates the need for users to manually select the best tool for a given task. Unlike traditional rule-based systems, Fugu dynamically adapts to the complexity and requirements of each task, making decisions in real time. This orchestration capability could streamline AI workflows, offering organizations a more efficient and scalable way to use multiple AI models. Fugu's orchestration system is designed to optimize performance by analyzing task parameters and selecting models that excel in specific areas. For instance, a task requiring natural language processing might be routed to a model specializing in linguistic analysis, while a separate task involving image recognition could be assigned to a vision-focused AI. This adaptive approach not only saves time but also ensures higher-quality outputs, making it a potentially valuable tool for businesses and researchers alike. Two Versions, Multiple Use Cases Fugu is available in two distinct versions, each tailored to address different user needs: * Regular Fugu: Designed for everyday tasks such as content generation, data analysis and basic automation. This version is ideal for small to medium-sized businesses looking for cost-effective AI solutions. * Fugu Ultra: Built for more complex applications, including advanced research, cybersecurity and high-performance 3D rendering. Fugu Ultra is aimed at enterprises and organizations requiring innovative capabilities. Early demonstrations suggest that Fugu Ultra excels in task efficiency and rendering, reportedly outperforming some standalone models in these areas. However, its effectiveness in other domains, such as real-time decision-making or large-scale data processing, remains less clear. This leaves potential users with unanswered questions about its versatility and reliability across diverse applications. Expand your understanding of Claude Fable with additional resources from our extensive library of articles. Performance Claims: Can Fugu Compete? Sakana Lab has made bold claims about Fugu's performance, asserting that it can match or even surpass leading models like Opus 4.8 and GPT 5.5 in standardized benchmarks. Internal testing indicates that Fugu Ultra may rival Mythos and Fable in output quality and task efficiency. These claims, if validated, could position Fugu as a serious contender in the AI market. However, independent verification of these results is currently limited, raising skepticism among industry experts. While benchmarks provide a useful measure of performance, they do not always reflect real-world scenarios. High-stakes applications, such as medical diagnostics or financial forecasting, require not only accuracy but also reliability and transparency. Fugu's ability to meet these demands remains uncertain and its success will likely depend on how well it performs under practical conditions. Pricing and Accessibility Fugu's pricing structure is designed to attract a wide range of users, offering flexibility for different budgets and usage levels: * Input tokens: $5 per million. * Output tokens: $30 per million. * Subscription plans: Range from $20 to $200 per month, depending on usage needs. Despite its competitive pricing, Fugu faces significant accessibility challenges. It is currently unavailable in the European Union due to privacy compliance issues, which restrict its reach in one of the largest AI markets globally. This limitation could hinder its adoption, especially among multinational organizations that require compliance with strict data protection regulations. Addressing these regulatory barriers will be crucial for Fugu's long-term success. Strengths and Weaknesses Fugu offers several advantages, but it also comes with notable limitations: * Strengths: * Weaknesses: Market Context: A Timely Entry Fugu enters the market at a time when concerns over "AI sovereignty" are growing. By reducing dependence on single providers, Fugu offers organizations a way to diversify their AI infrastructure. This makes it particularly appealing to governments and enterprises seeking greater control over their AI ecosystems. Additionally, Fugu's ability to integrate multiple models could help organizations mitigate risks associated with relying on a single AI provider. However, Fugu faces stiff competition from established models like GLM 5.2, which offers similar outputs at lower costs. To succeed, Fugu must demonstrate consistent value and performance, particularly in areas where its competitors have already established a strong foothold. Its ability to adapt to evolving market demands and address user concerns will be critical in determining its future trajectory. Reception and Future Outlook The response to Fugu has been mixed. While some users praise its orchestration capabilities and potential for cost savings, others question its reliability and real-world performance. The absence of independent benchmarking data further complicates its evaluation, making it difficult for potential users to fully assess its capabilities. As Sakana Lab continues to refine Fugu and address its limitations, the model has the potential to carve out a niche in the competitive AI landscape. However, it must overcome significant hurdles, including regulatory challenges, transparency concerns and competition from established players. If these issues are addressed, Fugu could emerge as a valuable tool for organizations seeking a flexible and efficient AI solution. Media Credit: Universe of AI Disclosure: Some of our articles include affiliate links. If you buy something through one of these links, Geeky Gadgets may earn an affiliate commission. Learn about our Disclosure Policy.
[5]
Ten days after Fable 5 shutdown, Sakana AI makes its move
A Japanese AI lab, Sakana AI, has launched Fugu and Fugu Ultra, innovative AI systems that claim frontier-level capabilities without relying on a single large model. This development follows US government actions against Anthropic, highlighting concerns about AI model availability. Just ten days after the US government forced Anthropic to pull Fable 5 and Mythos offline globally, a Japanese AI lab launched what may be the most interesting response yet. Tokyo-based Sakana AI has unveiled Fugu and Fugu Ultra, two new AI systems that make a fairly bold claim: frontier-level capabilities without depending on a single frontier model. The timing is not accidental. For years, the AI race has largely been defined by who could build the biggest and most capable model. The events of June 12 introduced a different question altogether: what happens when the model your company relies on suddenly becomes unavailable? Sakana's answer is that perhaps the future isn't one model at all. What Fugu Actually IsUnlike GPT, Claude, Gemini, or most frontier systems, Fugu isn't a monolithic model. Instead, it's an orchestration layer trained to coordinate multiple AI models behind a single API. From a developer's perspective, it behaves like one model. Behind the scenes, however, it distributes work across a pool of specialist models, assigning different roles such as reasoning, execution, verification, and synthesis before combining the results into a final response. The distinction matters. AI orchestration isn't new. Plenty of companies route prompts between models. What Sakana is arguing is that Fugu isn't a rules-based router wrapped in clever prompting. Rather, the orchestration itself has been trained as a model, learning when to delegate, which specialists to use, and how to merge outputs effectively. That approach builds on Sakana's earlier research efforts, including Trinity and Conductor, two orchestration-focused papers presented at ICLR 2026. In simple terms, Fugu's innovation isn't necessarily a bigger brain. It's a better manager. Why People Are Paying Attention The most compelling part of Fugu isn't the benchmark table. It's what Sakana chose to demonstrate. One showcase involved playing four consecutive games of blindfold chess, forcing the system to maintain an accurate internal representation of the board state throughout extended interactions. Another involved designing a mechanical iris mechanism in CAD, where multiple moving components must work together with precise physical constraints. Neither task is particularly flashy. What they test instead is something AI systems often struggle with: maintaining coherence over long, multi-step workflows without gradually drifting off course. That's increasingly becoming the real challenge in the age of AI agents. Generating a good answer is one thing. Remaining reliable through dozens or hundreds of interconnected decisions is another. Early user reports seem to point in the same direction. Developers have described deeper code reviews, while security practitioners have highlighted the system's ability to stay within defined assessment boundaries instead of wandering into unrelated tasks. Those anecdotes should be treated cautiously, but they align with the broader story Sakana is trying to tell. This is less about raw intelligence and more about sustained execution. The Benchmark Story Is More Nuanced Than The Headlines The benchmark results are strong. Fugu Ultra performs competitively against some of the most capable systems available and leads several coding and reasoning evaluations. Sakana's published numbers place it among the strongest models on tests such as LiveCodeBench, TerminalBench, and SWE-Bench Pro. But this is where it's worth resisting the temptation to oversimplify. Fable 5 still outperforms Fugu on certain software engineering benchmarks. GPT-5.5 remains ahead in some long-context evaluations. Other frontier models continue to hold advantages in specialized areas, including cybersecurity. In fact, there are even instances where standard Fugu scores higher than Fugu Ultra. The takeaway isn't that Fugu has overtaken everything else. It's that Sakana has managed to build a system that belongs in the same conversation as the frontier models many developers have spent the last year benchmarking against. That's an achievement on its own. The Bigger Story Isn't PerformanceThe real significance of Fugu may have very little to do with benchmark rankings. The launch arrives at a moment when AI access itself is becoming a geopolitical issue. The removal of Fable 5 and Mythos demonstrated a reality many organizations hadn't fully confronted: a model can disappear overnight because of regulatory decisions entirely outside a company's control. Sakana appears to have built Fugu around that exact problem. Because the system operates as an orchestrator across multiple models, the disappearance of one provider doesn't necessarily break the product. Models can be swapped, replaced, or rerouted while maintaining continuity for users. Whether that makes Fugu truly resilient remains an open question. But it does represent a different philosophy. Instead of asking which model is best, Sakana is asking what happens when your best model becomes unavailable. That's a question the AI industry may have to answer far more often in the years ahead. The Catch There are some important caveats. Fugu is not open source. The orchestration layer is proprietary, and Sakana has not disclosed the full composition of the model pool operating underneath it. That raises a reasonable question: how much of Fugu's performance comes from Sakana's orchestration technology versus the frontier models it coordinates? At the moment, outside observers don't have enough visibility to answer that definitively. Cost is another consideration. For particularly demanding workloads, Fugu Ultra can reportedly reach roughly $10 per message. That may be acceptable for research or specialized enterprise workflows, but it quickly becomes expensive at scale. And then there's the central marketing claim. Fugu is being positioned as a response to the vulnerability exposed by recent export controls. Yet if a significant portion of its model pool relies on providers ultimately subject to those same controls, then its resilience has practical limits. The architecture may be more flexible. Whether it is genuinely "unbannable" is something that remains unproven. The Bottom LineSakana AI's latest launch feels important not because it introduces another frontier model, but because it challenges the assumption that frontier AI must come from a single model in the first place. Fugu's central idea is that intelligence can emerge from coordination rather than scale alone. Whether that vision ultimately wins remains to be seen. But the timing is hard to ignore. A month ago, the AI conversation was largely about who had the smartest model. Today, a growing part of the discussion is about who can keep their AI running when the smartest model suddenly isn't available anymore. Fugu may be one of the first serious attempts to build for that reality.
[6]
Sakana AI's Fugu: This Japanese AI claims to match Anthropic's Fable 5 and Mythos Preview
What makes the AI of Japan's Sakana unique is that they developed an AI that can hold its own against the most powerful Anthropic's models without being one itself. This AI system is called Fugu, and it works through coordinating a group of available AI models instead of using one giant model. Rather than seeing it as a model, you should see it as a conductor that chooses the correct combination of models to perform a specific task and produces one final output through the coordination. This concept was called "multi-agent system as a model" by Sakana, and the numbers prove how impressive it can be on paper. Also read: Kunal Shah as WhatsApp chief: Why Meta and Zuckerberg picked CRED co-founder? When it comes to some important tests such as SWE Bench Pro (software engineering), LiveCodeBench (coding), and GPQA-Diamond (graduate scientific reasoning), Fugu Ultra, the highest level of this system, rivals Anthropic's Fable 5 and locked Mythos Preview models. In this case, it turns out that Fable 5 and Mythos Preview do not belong to Fugu's agent pool. According to Sakana, it was done intentionally because these models are not publicly accessible, meaning that the system's benchmark numbers are achieved exclusively through models that are freely accessible to anyone. The point is that a Japanese startup has managed to come up with something similar to Anthropic's system developed for many years in secrecy. Also read: The Mac Mini is the best on-device AI computer you can buy: Here's why Of course, how this benchmark parity will manifest itself in practical terms remains to be seen. Multi-agent systems tend to be more difficult to evaluate compared to single-model systems, and their orchestration increases latency, complexity and costs which may not always be reflected in headline figures. However, Fugu's design is great for structured tasks where you can properly route your requests between models. However, the main thing remains the same. When a multi-agent architecture can reach the level of performance of a frontier model using exclusively publicly available pieces, it redefines the idea of cutting-edge AI. It's not necessary to have the best model; sometimes, you only need to combine the existing models properly. Founded by Google Brain alumni David Ha and Llion Jones, Sakana AI has been moving in the direction of nature-inspired and evolutionary AI for quite some time already, since 2023. This is one of its most direct attempts to engage in the frontier lab discussion, but probably not the last one.
Share
Copy Link
Tokyo-based Sakana AI launched Fugu, an AI orchestration system that coordinates multiple specialized AI models to deliver frontier-level performance. The release comes just ten days after U.S. export controls forced Anthropic to revoke access to Claude Mythos 5 and Anthropic Fable 5, highlighting the risks of depending on a single AI provider.

Just ten days after the U.S. government forced Anthropic to pull Anthropic Fable 5 and Claude Mythos 5 offline globally due to export control orders, Tokyo-based Sakana AI unveiled a compelling response. The Japanese AI startup launched Sakana Fugu and Fugu Ultra, an AI orchestration system that claims to deliver frontier performance AI without relying on any single monolithic model
5
. Founded in 2023 by David Ha, formerly of Google Brain, and Llion Jones, co-author of the foundational Google paper "Attention Is All You Need," Sakana AI closed a $135 million Series B funding round at a $2.65 billion valuation last November3
. The timing proves especially significant as organizations confront a new reality: access to critical AI infrastructure can disappear overnight because of regulatory disruptions entirely outside their control.Unlike ChatGPT, Claude, or Gemini, Sakana Fugu doesn't attempt to be the smartest model in the room. Instead, it functions as an AI model orchestrator, acting like a project manager that intelligently delegates work
1
. When a user submits a task, the multi-agent orchestration system analyzes the request, decides which specialized AI models are best suited for different parts of the problem, routes work to those models, evaluates responses, and combines results into a final answer2
. Built on two research papers—TRINITY and Conductor—presented at the International Conference on Learning Representations (ICLR) 2026, the system learns how to assemble expert agents, assign roles, and coordinate collaboration patterns rather than following predefined structures3
. Users access the resilient AI system through a single OpenAI-compatible API while Fugu handles model routing, switching, and task delegation in the background.Sakana AI offers two versions tailored to different operational workloads. Standard Fugu is designed for everyday tasks, balancing performance and latency for coding, chatbots, and research tasks. Fugu Ultra prioritizes answer quality, coordinating a larger pool of agents for demanding AI workflows such as paper reproduction, cybersecurity analysis, and patent research
3
. According to benchmark results shared by Sakana AI, Fugu Ultra actually exceeds Anthropic Fable 5's performance on LiveCodeBench, an open-source benchmark testing coding performance—Fugu Ultra scored 93.2, standard Fugu scored 92.9, while Fable scored 89.82
. On GPQA-D, a test of 198 graduate-level multiple-choice questions in biology, physics, and chemistry, both Fugu Ultra and standard Fugu scored 95.5, beating the prior Claude Mythos Preview model's score of 94.62
. The system also performs competitively on SWE Bench Pro and TerminalBench, placing it among the strongest models available5
.David Ha positioned the AI orchestration system as a more reliable option for enterprise workflows than any single AI provider, directly addressing concerns raised by the June 12 incident. "Relying on a single company's model for national infrastructure is a massive risk," Ha wrote. "As recent export controls have shown, access to top models can disappear overnight. Collective intelligence is the practical hedge against this concentration of power"
2
. By orchestrating multiple models from different providers including GPT, Claude Opus, and Gemini, the system builds native redundancy into AI workflows3
. If one provider suffers an outage or faces sudden geopolitical export controls, Sakana Fugu routes around the disruption to maintain uptime. Developers can explicitly opt specific models or providers out of their routing pool to maintain strict corporate privacy standards2
.The platform is available through both subscriptions and pay-as-you-go usage. Subscription plans range from $20 to $200 per month, while Fugu Ultra API pricing starts at $5 per million input tokens and $30 per million output tokens
3
. Standard Fugu charges a dynamic rate based on the specific underlying specialized AI models activated2
. However, the system is offered as a commercial, proprietary API service, not an open-source framework. Because Sakana's core intellectual property lies in its collaboration patterns, the specific routing information remains proprietary and intentionally hidden from users2
. The system currently faces accessibility challenges, remaining unavailable in the European Union due to privacy compliance issues4
.Early demonstrations suggest promising capabilities for sustained execution across complex tasks. One showcase involved playing four consecutive games of blindfold chess, forcing the system to maintain accurate internal representation throughout extended interactions. Another involved designing a mechanical iris mechanism in CAD with precise physical constraints
5
. Most people think of AI competition as a race to build the biggest and most powerful model, but Sakana Fugu points toward a different possibility. Instead of models attempting to outperform each other individually, the future may belong to systems that know how to combine multiple specialized AI models effectively1
. What makes Sakana AI stand out is that it has trained the orchestration process itself, making the coordinator as important as the workers. The lesson from the Anthropic Fable 5 incident is clear: relying on a single AI provider creates vulnerabilities when access changes, outages occur, pricing shifts, or capabilities evolve.Summarized by
Navi
[1]
[2]
1
Policy and Regulation

2
Policy and Regulation

3
Technology
