2 Sources
2 Sources
[1]
Moonshot AI's new Kimi K2.6 swarms your complex tasks with 1,000 collaborating agents
Moonshot AI pushes autonomous coding to new limits.AI designs and builds full-stack apps from prompts.Persistent agents run for days, handling real operations. Yesterday, Moonshot AI announced Kimi K2.6, the latest version of its open-source AI model. This release has enhanced coding capabilities, long multi-step operation execution, and agent swarm capabilities (which doesn't sound terrifying at all). Also: The best free AI for coding - only 3 make the cut now The company is doubling down on what it calls a "seamless AI coworker experience," based on a reinterpretation of the OpenClaw AI assistant approach to automated AI processing for complex, real-world workflows. At the core of the Kimi K2.6 release is a substantial improvement in long-horizon coding performance. Long-horizon coding is another way of saying that the AI can do a very long series of steps without human oversight. Think of the difference between short-horizon and long-horizon as analogous to the difference between having an employee you have to check on every 15 minutes, and an employee to whom you can just give an assignment and know that what you need will be on your desk tomorrow morning without fuss or hassle. Also: 7 AI coding techniques I use to ship real, reliable products - fast Moonshot uses a SysY compiler project as an example of a long-horizon assignment. SysY is a minimalist C-like language used for teaching compiler design to students. Kimi K2.6 designed and built a full SysY compiler from scratch in 10 hours, passing 140 functional tests without human input. It says this work is the equivalent of having four engineers working for two months. Without a doubt, this is a considerable accomplishment. But Moonshot is not alone in using AI to build compilers. Anthropic reported in February that it built a full C compiler (not just a cut-down training wheels version) using its Opus 4.6 model. The Anthropic project did fairly well, but it did run into a snag when the agents hit the complex task of compiling the Linux kernel, causing them to get stuck on the same bugs, overwrite each other's work, and break existing functionality as new features were added. I'm guessing that the choice of SysY on the part of the Kimi developers was to keep the overall complexity down, and that this new model would probably hit a similar set of snags to those Anthropic encountered. Moonshot says that the K2.6 model demonstrates strong generalization (meaning it's able to handle new and unexpected situations across languages including Rust, Go, and Python). It also reports that the new model demonstrates reliability across front-end, DevOps, and performance optimization tasks. Coding output isn't Kimi K2.6's only big trick. The model is capable of doing user interface design work and then producing coding output from that design. This enables non-coders to build full web applications from prompts, including the look and feel. It provides an assist to developers who may not have design expertise. Also: I tried to save $1,200 by vibe coding for free - and quickly regretted it Going back to the long-horizon claim discussed earlier, Moonshot demonstrated the full-scale project capability by building a series of websites. The company reported that Kimi K2.6, "Identified 30 restaurants in Los Angeles without official websites, then automatically generated high-converting landing pages for each. These pages include booking functionality, with all information seamlessly synchronized to their database." According to Moonshot AI founder Zhilin Yang, "By orchestrating 100 or even 1,000 sub-agents in parallel, we can accomplish complex tasks within a timeframe that is tolerable for the real world." It calls this "agent swarms." I don't know. I've probably seen Terminator too many times, but while I can see the practical benefit, the very idea of swarms of AI agents is freaky as heck. The company reports, "It seamlessly coordinates heterogeneous agents to combine complementary skills and broad search capabilities layered with deep research, plus large-scale document analysis fused with long-form writing, and multi-format content generation executed in parallel." It says that, "This compositional intelligence enables the swarm to deliver end-to-end outputs spanning documents, websites, slides, and spreadsheets within a single autonomous run." The Kimi K2.6 model now supports autonomous agents operating continuously across applications and workflows. This release also improves API interpretation, long-running stability, and safety awareness. The company demonstrated a K2.6-backed agent that, "Operated autonomously for 5 days, managing monitoring, incident response, and system operations, demonstrating persistent context, multi-threaded task handling, and full-cycle execution from alert to resolution." Also: AI agents are fast, loose, and out of control, MIT study finds Another capability added to Kimi K2.6 is what the company calls "Claw Groups," enabling multiple OpenClaw-style agents running across devices to collaborate with a shared context. There is a central coordinator that dynamically assigns tasks and resolves failures. Moonshot AI says this all becomes a form of collective intelligence. It says, "We are moving beyond simply asking AI a question or assigning AI a task, and entering a phase where human and AI collaborate as genuine partners--combining strengths to solve problems collectively." As long as the agents don't go and invent time travel, we're probably safe. For now. Would you feel comfortable letting an AI agent run continuously for days, managing systems on your behalf? Let us know in the comments below.
[2]
Kimi K2.6 runs agents for days -- and exposes the limits of enterprise orchestration
Most orchestration frameworks were built for agents that run for seconds or minutes. Now that agents are running for hours -- and in some cases days -- those frameworks are starting to crack. Several model providers, such as Anthropic with Claude Code and OpenAI with Codex, introduced early support for long-horizon agents through multi-session tasks, subagents and background execution. However, these systems sometimes assume agents are still operating within bounded-time workflows even when they run for extended periods. Open-source model provider Moonshot AI wants to push beyond that with its new model, Kimi K2.6. Moonshot says the model is designed for continuous execution, with internal use cases including agents that ran for hours and, in one case, five straight days, handling monitoring and incident response autonomously. But this growing use of this type of agent is exposing a critical gap in orchestration: most orchestration frameworks were not designed for this type of continuous, stateful execution. Open-source models, such as Kimi K2.6, that rely on agent swarms are making the case that their orchestration approach comes close to managing stateful agents. The difficulties of orchestrating long-running agents While it is true that some enterprises would rather bring their own orchestration frameworks to their agentic ecosystem, model providers and agent platforms recognize that offering agent management remains a competitive advantage. Other model providers have begun exploring long-running agents, many through multi-session tasks and background execution. For example, Anthropic's Claude Code orchestrates agents with a lead agent that directs other agents based on a set of user-instructed definitions. OpenAI's Codex runs similarly. Kimi K2.6 approaches orchestration with an improved version of its Agent Swarms, capable of managing up to 300 sub-agents "executing across 4,000 coordinated steps simultaneously," Moonshot AI wrote in a blog post. Compared to both Claude Code and Codex, K2.6 relies on the model, rather than pre-defined roles, to determine orchestration. Kimi K2.6 is now available on Hugging Face, through its API, Kimi Code and the Kimi app. Practitioners experimenting with long-horizon agents say the brittleness runs deeper than prompting can fix. As one practitioner, Maxim Saplin, put it in a blog post, "That does not mean subagents are useless. It means orchestration is still fragile. Right now, it feels more like a product and training problem than something you can solve by writing a sufficiently stern prompt." The problem long-running agents pose is that it's difficult to maintain their state, especially as their environment continues to change while they're doing their job. The agent would constantly call different tools and APIs or tap into different databases during its runtime. Most current agents, those that may run for one or two executions, do call different tools, but for at most a minute. Mark Lambert, chief product officer at ArmorCode, which builds an autonomous security platform for enterprises, told VentureBeat in an email that the governance gap is already outpacing deployment. "These agentic systems can now generate code and system changes faster than most organizations can review, remediate, or govern them. This will require more than just additional scanning. Organizations will need stronger AI governance that provides the context, prioritization, and accountability teams need to manage Kimi and other AI-generated risk before they turn into accumulated exposure," Lambert said. Long-running agents could also risk failure without a clear rollback. Most importantly, these types of agents often lack a set of well-defined tasks and dynamically adjust their plans as they run. Kunal Anand, chief product officer at F5, told VentureBeat in an email that long-horizon agents represent a much bigger architectural shift than most companies were prepared for. "We went from scripts to services to containers to functions, and now to agents as persistent infrastructure. That creates categories we do not yet have good names for: agent runtime, agent gateway, agent identity provider, agent mesh. The API gateway pattern is morphing into something that has to understand goals and workflows, not just endpoints and verbs," Anand said. Running for 13 hours and even five days Understanding how to orchestrate agents becomes important because model capabilities have begun to outpace orchestration innovations, even as enterprises start to look at long-horizon agents. Moonshot AI says the model is built for tasks that reflect "real-world challenges that typically demand weeks or months of collective human effort." In a separate technical document provided to VentureBeat, Moonshot claims K2.6 built a full SysY compiler from scratch in 10 hours -- work it characterized as equivalent to a team of four engineers over two months -- and passed all 140 functional tests without human intervention. The team deployed K2.6 to complex engineering tasks, including overhauling an eight-year-old open source financial matching engine. Moonshot's engineers described a 13-hour execution that "iterated through 12 optimization strategies, initiating over 1,000 tool calls to modify more than 4,000 lines of code precisely." Moonshot said one of its teams used K2.6 to build an agent that ran autonomously for five days. That agent managed monitoring, incident response and system operations.
Share
Share
Copy Link
Moonshot AI unveiled Kimi K2.6, an open-source AI model that orchestrates up to 1,000 collaborating agents to handle complex tasks autonomously. The model ran continuously for five days in one demonstration, managing monitoring and incident response without human intervention. But the release exposes critical gaps in enterprise orchestration frameworks not designed for such long-running, stateful execution.
Moonshot AI announced Kimi K2.6, its latest open-source AI model designed to handle complex task execution through what the company calls agent swarms. The release marks a significant step in autonomous AI operations, with the model capable of orchestrating up to 1,000 collaborating agents executing across 4,000 coordinated steps simultaneously
2
. According to Moonshot AI founder Zhilin Yang, "By orchestrating 100 or even 1,000 sub-agents in parallel, we can accomplish complex tasks within a timeframe that is tolerable for the real world"1
. The model is now available on Hugging Face, through its API, Kimi Code and the Kimi app2
.
Source: ZDNet
At the core of Kimi K2.6 lies substantial improvement in long-horizon coding, enabling the AI to execute extended series of steps without human oversight. The model demonstrated this capability by designing and building a full SysY compiler from scratch in 10 hours, passing all 140 functional tests without human input—work characterized as equivalent to four engineers working for two months
1
2
. The model shows strong generalization across languages including Rust, Go, and Python, with reliability spanning front-end, DevOps, and performance optimization tasks1
. Beyond code generation, Kimi K2.6 handles user interface design work and produces coding output from those designs, enabling non-coders to build full web applications from prompts1
.Kimi K2.6 supports autonomous agents operating continuously across applications and workflows, with improved API interpretation, long-running stability, and safety awareness. In one demonstration, a K2.6-backed agent operated autonomously for five days, managing monitoring, incident response, and system operations while demonstrating persistent context, multi-threaded task handling, and full-cycle execution from alert to resolution
1
2
. The model also identified 30 restaurants in Los Angeles without official websites, then automatically generated high-converting landing pages for each, including booking functionality with all information synchronized to their database1
.
Source: VentureBeat
Related Stories
The release exposes a fundamental challenge: most orchestration frameworks were built for agents that run for seconds or minutes, not hours or days
2
. While model providers like Anthropic and OpenAI introduced early support for long-horizon AI agents through multi-session tasks and background execution, these systems often assume agents operate within bounded-time workflows2
. Kimi K2.6 approaches orchestration differently, relying on the model rather than pre-defined roles to determine coordination2
. The difficulty lies in maintaining state maintenance as the environment changes during runtime, with agents constantly calling different tools and APIs or tapping into different databases2
.Mark Lambert, chief product officer at ArmorCode, noted that "these agentic systems can now generate code and system changes faster than most organizations can review, remediate, or govern them," requiring stronger AI governance that provides context, prioritization, and accountability
2
. Kunal Anand, chief product officer at F5, described long-horizon agents as representing "a much bigger architectural shift than most companies were prepared for," creating new categories like agent runtime, agent gateway, and agent mesh2
. Long-running agents risk failure without clear rollback mechanisms and often lack well-defined tasks, dynamically adjusting their plans as they run2
. The model's ability to seamlessly coordinate heterogeneous agents combining complementary skills delivers end-to-end outputs spanning documents, websites, slides, and spreadsheets within a single autonomous run1
.Summarized by
Navi
1
Technology

2
Science and Research

3
Technology
