6 Sources
6 Sources
[1]
Mistral bets big on "vibe coding" with new autonomous software-engineering agent
On Tuesday, French AI startup Mistral AI released Devstral 2, a 123 billion parameter open-weights coding model designed to work as part of an autonomous software engineering agent. The model achieves a 72.2 percent score on SWE-bench Verified, a benchmark that attempts to test whether AI systems can solve real GitHub issues, putting it among the top-performing open-weights models. Perhaps more notably, Mistral didn't just release an AI model, it released a new development app called Mistral Vibe. It's a command line interface (CLI) similar to Claude Code, OpenAI Codex, and Gemini CLI that lets developers interact with the Devstral models directly in their terminal. The tool can scan file structures and Git status to maintain context across an entire project, make changes across multiple files, and execute shell commands autonomously. Mistral released the CLI under the Apache 2.0 license. It's always wise to take AI benchmarks with a large grain of salt, but we've heard from employees of the big AI companies that they pay very close attention to how well models do on SWE-bench Verified, which presents AI models with 500 real software engineering problems pulled from GitHub issues in popular Python repositories. The AI must read the issue description, navigate the codebase, and generate a working patch that passes unit tests. While some AI researchers have noted that around 90 percent of the tasks in the benchmark test relatively simple bug fixes that experienced engineers could complete in under an hour, it's one of the few standardized ways to compare coding models. At the same time as the larger AI coding model, Mistral also released Devstral Small 2, a 24 billion parameter version that scores 68 percent on the same benchmark and can run locally on consumer hardware like a laptop with no Internet connection required. Both models support a 256,000 token context window, allowing them to process moderately large codebases (although whether you consider it large or small is very relative depending on overall project complexity). The company released Devstral 2 under a modified MIT license and Devstral Small 2 under the more permissive Apache 2.0 license. Devstral 2 is currently free to use through Mistral's API. After the free period ends, pricing will be $0.40 per million input tokens and $2.00 per million output tokens. Devstral Small 2 will cost $0.10 per million input tokens and $0.30 per million output tokens. Mistral says it's about "7x more cost-efficient than Claude Sonnet at real-world tasks." Anthropic's Sonnet 4.5 through the API costs $3 per million input tokens and $15 per million output tokens, with increases depending on the total number of tokens used. The vibe-coding connection The name "Mistral Vibe" references "vibe coding," a term that AI researcher Andrej Karpathy coined in February 2025 to describe a style of programming where developers describe what they want in natural language and accept AI-generated code without reviewing it closely. As Karpathy describes it, you can "fully giv[e] in to the vibes, embrace exponentials, and forget that the code even exists." Collins Dictionary named it Word of the Year for 2025. The vibe coding approach has drawn both enthusiasm and concern. In an interview with Ars Technica in March, developer Simon Willison said, "I really enjoy vibe coding. It's a fun way to try out an idea and prove if it can work." But he also warned that "vibe coding your way to a production codebase is clearly risky. Most of the work we do as software engineers involves evolving existing systems, where the quality and understandability of the underlying code is crucial." Mistral is betting that Devstral 2 will be able to maintain coherency across entire projects, detect failures, and retry with corrections, and that those claimed abilities will make it suitable for more serious work than simple prototypes and in-house tools. The company says the model can track framework dependencies and handle tasks like bug fixing and modernizing legacy systems at repository scale. We have not experimented with it yet, but you might see an Ars Technica head-to-head test of several AI coding tools soon.
[2]
Mistral AI surfs vibe coding tailwinds with new coding models | TechCrunch
French AI startup Mistral today launched Devstral 2, a new generation of its AI model designed for coding, as the company seeks to catch up to bigger AI labs like Anthropic and other coding-focused LLMs. This announcement follows the recent launch of the Mistral 3 family of open-weight models, and confirms Mistral's intent to close in on its bigger and better-funded AI rivals. The unicorn is also jumping into the 'vibe-coding' race that has fueled the rise of companies like Cursor and Supabase with Mistral Vibe, a new command-line interface (CLI) aimed at facilitating code automation through natural language, with tools for file manipulation, code searching, version control, and command execution. Mistral AI is betting on the added value of context awareness, which is particularly relevant in business use cases. Similar to its AI assistant, Le Chat, which can remember previous conversations with users and use that context to guide its answers, Vibe CLI features persistent history, and can also scan file structures and Git statuses to build context to inform its behavior. This focus on production-grade workflows also explains why Devstral 2 is relatively demanding, requiring at least four H100 GPUs or equivalent for deployment, and weighing 123 billion parameters. However, the model is also available in a smaller size with Devstral Small, which, at 24 billion parameters, makes it deployable locally on consumer hardware. The models differ in their open-source licensing -- Devstral 2 ships under a modified MIT license, while Devstral Small uses Apache 2.0. They also differ in pricing. Devstral 2 is currently free to use via the company's API. After the free period, the API pricing will cost $0.40/$2.00 per million tokens (input/output) for Devstral 2, and $0.10/$0.30 for Devstral Small. Mistral has partnered with agent tools Kilo Code and Cline to release Devstral 2 to users, while Mistral Vibe CLI is available as an extension in Zed for use inside the IDE. Europe's champion AI lab, Mistral is currently valued at €11.7 billion (approximately $13.8 billion) following a Series C funding round led by Dutch semiconductor company ASML, which invested €1.3 billion (approximately $1.5 billion) in September.
[3]
Mistral launches powerful Devstral 2 coding model including open source, laptop-friendly version
French AI startup Mistral has weathered a rocky period of public questioning over the last year to emerge, now here in December 2025, with new, crowd-pleasing models for enterprise and indie developers. Just days after releasing its powerful open source, general purpose Mistral 3 LLM family for edge devices and local hardware, the company returned today to debut Devstral 2. The release includes a new pair of models optimized for software engineering tasks -- again, with one small enough to run on a single laptop, offline and privately -- alongside Mistral Vibe, a command-line interface (CLI) agent designed to allow developers to call the models up directly within their terminal environments. The models are fast, lean, and open -- at least in theory. But the real story lies not just in the benchmarks, but in how Mistral is packaging this capability: one model fully free, another conditionally so, and a terminal interface built to scale with either. It's an attempt not just to match proprietary systems like Claude and GPT-4 in performance, but to compete with them on developer experience -- and to do so while holding onto the flag of open-source. Both models are available now for free for a limited time via Mistral's API and Hugging Face. The full Devstral 2 model is supported out-of-the-box in the community inference provider vLLM and on the open source agentic coding platform Kilo Code. A Coding Model Meant to Drive At the top of the announcement is Devstral 2, a 123-billion parameter dense transformer with a 256K-token context window, engineered specifically for agentic software development. Mistral says the model achieves 72.2% on SWE-bench Verified, a benchmark designed to evaluate long-context software engineering tasks in real-world repositories. The smaller sibling, Devstral Small 2, weighs in at 24B parameters, with the same long context window and a performance of 68.0% on SWE-bench. On paper, that makes it the strongest open-weight model of its size, even outscoring many 70B-class competitors. But the performance story isn't just about raw percentages. Mistral is betting that efficient intelligence beats scale, and has made much of the fact that Devstral 2 is: * 5× smaller than DeepSeek V3.2 * 8× smaller than Kimi K2 * Yet still matches or surpasses them on key software reasoning benchmarks. Human evaluations back this up. In side-by-side comparisons: * Devstral 2 beat DeepSeek V3.2 in 42.8% of tasks, losing only 28.6%. * Against Claude Sonnet 4.5, it lost more often (53.1%) -- a reminder that while the gap is narrowing, closed models still lead in overall preference. Still, for an open-weight model, these results place Devstral 2 at the frontier of what's currently available to run and modify independently. Vibe CLI: A Terminal-Native Agent Alongside the models, Mistral released Vibe CLI, a command-line assistant that integrates directly with Devstral models. It's not an IDE plugin or a ChatGPT-style code explainer. It's a native interface designed for project-wide code understanding and orchestration, built to live inside the developer's actual workflow. Vibe brings a surprising degree of intelligence to the terminal: * It reads your file tree and Git status to understand project scope. * It lets you reference files with @, run shell commands with !, and toggle behavior with slash commands. * It orchestrates changes across multiple files, tracks dependencies, retries failed executions, and can even refactor at architectural scale. Unlike most developer agents, which simulate a REPL from within a chat UI, Vibe starts with the shell and pulls intelligence in from there. It's programmable, scriptable, and themeable. And it's released under the Apache 2.0 license, meaning it's truly free to use -- in commercial settings, internal tools, or open-source extensions. Licensing Structure: Open-ish -- With Revenue Limitations At first glance, Mistral's licensing approach appears straightforward: the models are open-weight and publicly available. But a closer look reveals a line drawn through the middle of the release, with different rules for different users. Devstral Small 2, the 24-billion parameter variant, is covered under a standard, enterprise- and developer-friendly Apache 2.0 license. That's a gold standard in open-source: no revenue restrictions, no fine print, no need to check with legal. Enterprises can use it in production, embed it into products, and redistribute fine-tuned versions without asking for permission. Devstral 2, the flagship 123B model, is released under what Mistral calls a "modified MIT license." That phrase sounds innocuous, but the modification introduces a critical limitation: any company making more than $20 million in monthly revenue cannot use the model at all -- not even internally -- without securing a separate commercial license from Mistral. "You are not authorized to exercise any rights under this license if the global consolidated monthly revenue of your company [...] exceeds $20 million," the license reads. The clause applies not only to the base model, but to derivatives, fine-tuned versions, and redistributed variants, regardless of who hosts them. In effect, it means that while the weights are "open," their use is gated for large enterprises -- unless they're willing to engage with Mistral's sales team or use the hosted API at metered pricing. To draw an analogy: Apache 2.0 is like a public library -- you walk in, borrow the book, and use it however you need. Mistral's modified MIT license is more like a corporate co-working space that's free for freelancers but charges rent once your company hits a certain size. Weighing Devstral Small 2 for Enterprise Use This division raises an obvious question for larger companies: can Devstral Small 2 with its more permissive and unrestricted Apache 2.0 licensing serve as a viable alternative for medium-to-large enterprises? The answer depends on context. Devstral Small 2 scores 68.0% on SWE-bench, significantly ahead of many larger open models, and remains deployable on single-GPU or CPU-only setups. For teams focused on: * internal tooling, * on-prem deployment, * low-latency edge inference, ...it offers a rare combination of legality, performance, and convenience. But the performance gap from Devstral 2 is real. For multi-agent setups, deep monorepo refactoring, or long-context code analysis, that 4-point benchmark delta may understate the actual experience difference. For most enterprises, Devstral Small 2 will serve either as a low-friction way to prototype -- or as a pragmatic bridge until licensing for Devstral 2 becomes feasible. It is not a drop-in replacement for the flagship, but it may be "good enough" in specific production slices, particularly when paired with Vibe CLI. But because Devstral Small 2 can be run entirely offline -- including on a single GPU machine or a sufficiently specced laptop -- it unlocks a critical use case for developers and teams operating in tightly controlled environments. Whether you're a solo indie building tools on the go, or part of a company with strict data governance or compliance mandates, the ability to run a performant, long-context coding model without ever hitting the internet is a powerful differentiator. No cloud calls, no third-party telemetry, no risk of data leakage -- just local inference with full visibility and control. This matters in industries like finance, healthcare, defense, and advanced manufacturing, where data often cannot leave the network perimeter. But it's just as useful for developers who prefer autonomy over vendor lock-in -- or who want their tools to work the same on a plane, in the field, or inside an air-gapped lab. In a market where most top-tier code models are delivered as API-only SaaS products, Devstral Small 2 offers a rare level of portability, privacy, and ownership. In that sense, Mistral isn't just offering open models -- they're offering multiple paths to adoption, depending on your scale, compliance posture, and willingness to engage. Integration, Infrastructure, and Access From a technical standpoint, Mistral's models are built for deployment. Devstral 2 requires a minimum of 4× H100-class GPUs, and is already available on build.nvidia.com. Devstral Small 2 can run on a single GPU or CPU such as those in a standard laptop, making it accessible to solo developers and embedded teams alike. Both models support quantized FP4 and FP8 weights, and are compatible with vLLM for scalable inference. Fine-tuning is supported out of the box. API pricing -- after the free introductory window -- follows a token-based structure: * Devstral 2: $0.40 per million input tokens / $2.00 for output * Devstral Small 2: $0.10 input / $0.30 output That pricing sits just below OpenAI's GPT-4 Turbo, and well below Anthropic's Claude Sonnet at comparable performance levels. Developer Reception: Ground-Level Buzz On X (formerly Twitter), developers reacted quickly with a wave of positive reception, with Hugging Face's Head of Product Victor Mustar asking if the small, Apache 2.0 licensed variant was the "new local coding king," i.e., the one developers could use to run on their laptops directly and privately, without an internet connection: Another popular AI news and rumors account, TestingCatalogNews, posted that it was "SOTTA in coding," or "State Of The Tiny Art" Another user, @xlr8harder, took issue with the custom licensing terms for Devstral 2, writing "calling the Devstral 2 license 'modified MIT' is misleading at best. It's a proprietary license with MIT-like attribution requirements." While the tone was critical, it reflected some attention Mistral's license structuring was receiving, particularly among developers familiar with open-use norms. Strategic Context: From Codestral to Devstral and Mistral 3 Mistral's steady push into software development tools didn't start with Devstral 2 -- it began in May 2024 with Codestral, the company's first code-focused large language model. A 22-billion parameter system trained on more than 80 programming languages, Codestral was designed for use in developer environments ranging from basic autocompletions to full function generation. The model launched under a non-commercial license but still outperformed heavyweight competitors like CodeLlama 70B and Deepseek Coder 33B in early benchmarks such as HumanEval and RepoBench. Codestral's release marked Mistral's first move into the competitive coding-model space, but it also established a now-familiar pattern: technically lean models with surprisingly strong results, a wide context window, and licensing choices that invited developer experimentation. Industry partners including JetBrains, LlamaIndex, and LangChain quickly began integrating the model into their workflows, citing its speed and tool compatibility as key differentiators. One year later, the company followed up with Devstral, a 24B model purpose-built for "agentic" behavior -- handling long-range reasoning, file navigation, and autonomous code modification. Released in partnership with All Hands AI and licensed under Apache 2.0, Devstral was notable not just for its portability (it could run on a MacBook or RTX 4090), but for its performance: it beat out several closed models on SWE-Bench Verified, a benchmark of 500 real-world GitHub issues. Then came Mistral 3, announced in December 2025 as a portfolio of 10 open-weight models targeting everything from drones and smartphones to cloud infrastructure. This suite included both high-end models like Mistral Large 3 (a MoE system with 41 active parameters and 256K context) and lightweight "Ministral" variants that could run on 4GB of VRAM. All were licensed under Apache 2.0, reinforcing Mistral's commitment to flexible, edge-friendly deployment. Mistral 3 positioned the company not as a direct competitor to frontier models like GPT-5 or Gemini 3, but as a developer-first platform for customized, localized AI systems. Co-founder Guillaume Lample described the vision as "distributed intelligence" -- many smaller systems tuned for specific tasks and running outside centralized infrastructure. "In more than 90% of cases, a small model can do the job," he told VentureBeat. "It doesn't have to be a model with hundreds of billions of parameters." That broader strategy helps explain the significance of Devstral 2. It's not a one-off release but a continuation of Mistral's long-running commitment to code agents, local-first deployment, and open-weight availability -- an ecosystem that began with Codestral, matured through Devstral, and scaled up with Mistral 3. Devstral 2, in this framing, is not just a model. It's the next version of a playbook that's been unfolding in public for over a year. Final Thoughts (For Now): A Fork in the Road With Devstral 2, Devstral Small 2, and Vibe CLI, Mistral AI has drawn a clear map for developers and companies alike. The tools are fast, capable, and thoughtfully integrated. But they also present a choice -- not just in architecture, but in how and where you're allowed to use them. If you're an individual developer, small startup, or open-source maintainer, this is one of the most powerful AI systems you can freely run today. If you're a Fortune 500 engineering lead, you'll need to either talk to Mistral -- or settle for the smaller model and make it work. In a market increasingly dominated by black-box models and SaaS lock-ins, Mistral's offer is still a breath of fresh air. Just read the fine print before you start building.
[4]
Mistral AI's Devstral 2 is an open-weights vibe coding model built to rival the best proprietary systems - SiliconANGLE
Mistral AI's Devstral 2 is an open-weights vibe coding model built to rival the best proprietary systems French artificial intelligence startup Mistral AI is jumping into the vibe coding market with the launch of Devstral 2, a new model that's built specifically to handle advanced coding tasks. Announced today, Devstral 2 is a 123-billion parameter model that comes with autonomous software engineering capabilities that the company says can rival those of the best proprietary vibe coding systems, while undercutting costs by almost 85%. Devstral 2 is being released alongside Mistral Vibe, a new command-line interface that allows developers to access the model and execute complex tasks using natural language prompts. There's also a smaller model called Devstral Small 2, which has just 24-billion parameters and is optimized for local deployment. Mistral AI is trying to provide an alternative to proprietary coding bots from companies such as Google LLC and OpenAI, which try to lock users into their ecosystems, positioning itself as a true open-source vibe coding platform. The Mistral Vibe CLI is the key interface here, leveraging the underlying Devstral models to transform user's natural language prompts into real-world code changes. The system is comprehensive, and rather than just generating snippets of code, it's meant to explore, modify and create changes across an entire codebase. It can operate within any third-party integrated development environment or as a standalone terminal utility, with a suite of tools that enables the model to edit files, search codebases, manage version control and execute shell commands with full autonomy. Mistral AI said Devstral 2's ability to scan file structures and Git statuses means it has the "project-aware context" to update dependencies or make changes to the codebase across an entire project without losing track of what it's doing. Mistral AI says its primary focus with Devstral 2 is operational efficiency. It combines its 123-billion parameter dense transformer architecture with a 256,000 token context window to achieve an impressive 72.2% score on the SWE-bench benchmark, ranking it among the top performing coding models in the business. Among rival open-weight models, only DeepSeek V3.2 has achieved a superior score. Google, OpenAI and Anthropic PBC's cutting-edge models also rank higher. The Devstral Small 2 variant also demonstrates impressive performance considering its much smaller footprint, with its 68% score on the same benchmark meaning it can go toe-to-toe with models that are five-times larger. It's impressive because the model's lightweight nature means it's efficient enough to run on consumer hardware such as a standard laptop, eliminating the latency that comes with AI running in the cloud. However, Mistral AI says Devstral 2's biggest advantage is in cost, with the model priced at $0.40 per million input tokens and $2 per million output tokens via its application programming interface. That undercuts models such as Anthropic's Claude Sonnet 3.5 by around seven times. The most obvious difference between Devstral 2 and Devstral Small 2 is size, but Mistral AI has opted to issue the models under different licensing terms. Devstral 2 ships with a Modified MIT license that implies some kind of revenue-based restrictions on users, whereas Devstral Small 2 is available under a more permissive Apache 2.0 license, meaning no restrictions on its use. Developers can therefore modify the smaller model and integrate it anyway they desire, without the legal headaches of proprietary systems. . It's easy to imagine how some enterprises using Devstral 2 opt for a hybrid workflow, using the heavy 123-billion parameter model for complex architectural planning tasks, and the smaller 24-billion parameter version for more rapid, private code changes that remain within their network boundaries. Mistral AI is launching Devstral 2 at a time when competition in the vibe coding segment intensifies, but its open-weights strategy may find some appeal. Competitors have been trying to lock in developers, with Google's newly announced partnership with Replit Inc. exemplifying this push by bundling a powerful model, IDE and cloud compute into a single stack. Google's Gemini 3 Pro is tightly integrated with the company's new AntiGravity IDE and also seems to want to keep developers locked into its ecosystem, while Anthropic is said to be building a dedicated runtime for its own coding agents following its acquisition of the JavaScript toolkit Bun. With Devstral 2, Mistral AI provides developers with a welcome alternative, combining open weights with local deployment that sits firmly outside the walled gardens being built by many of its rivals.
[5]
Devstral's New Coding Model Rivals Bigger Systems, 72.2% Swaybench and Low Token Costs
What if the future of coding wasn't just faster, but smarter, more accessible, and surprisingly affordable? Enter Mistral Devstral 2, the latest open source large language model (LLM) that's rewriting the rules of back-end development. With a staggering 72.2% score on the Swaybench test, this model doesn't just compete with proprietary giants like Deepseek 3.2, it outpaces them in cost-efficiency and adaptability. Imagine a tool that not only automates multifile changes and debugs with precision but also modernizes legacy code seamlessly, all while running on consumer-grade hardware. It's not just a coding assistant; it's a fantastic option for developers, enterprises, and enthusiasts alike. In this coverage, World of AI explore how Devstral 2 is setting a new benchmark for agentic coding models by combining innovative performance with unprecedented accessibility. You'll discover why its compact yet powerful architecture is reshaping workflows, from automating repetitive tasks to addressing security vulnerabilities. We'll also dive into its unique features, like the Mistral Vibe CLI, which simplifies codebase exploration and execution. Whether you're curious about its open source licensing, intrigued by its ability to run on an RTX 4090 GPU, or eager to see how it stacks up against competitors, this breakdown will reveal why Devstral 2 is more than just a tool, it's a vision for the future of coding. Devstral 2 delivers exceptional results, achieving a 72.2% score on the Swaybench test, a benchmark that evaluates coding model performance. This places it among the highest-performing open-weight coding models available today. Despite its relatively compact architecture, it competes directly with proprietary systems like Deepseek 3.2 and Google's advanced models, while outperforming competitors such as GLM 4.6 and Miniax Quen 3. A key differentiator for Devstral 2 is its remarkable cost-efficiency. It is up to seven times more economical than alternatives like Claude Sonnet, making it an attractive option for developers and organizations operating on tight budgets. For those requiring a lighter model, Devstral Small provides a scaled-down yet highly capable alternative, making sure flexibility for a variety of use cases. Devstral 2 is purpose-built to meet the specific demands of back-end development, offering a range of features that streamline complex workflows. Its core capabilities include: The model also supports fine-tuning for specific programming languages and enterprise systems, allowing it to adapt to diverse coding environments. These features make Devstral 2 an indispensable tool for production-grade workflows, allowing developers to focus on innovation rather than repetitive tasks. Browse through more resources below from our in-depth content covering more areas on Mistral AI. One of the standout features of Devstral 2 is its ability to run efficiently on consumer-grade hardware. You can deploy the model using an RTX 4090 GPU or a Mac with 32GB of RAM, making it accessible to a wide range of users, from individual developers to large organizations. The pricing structure further enhances its appeal. Devstral 2 is priced at $0.40 per 1 million input tokens and $2 per 1 million output tokens, while Devstral Small offers even lower costs. Additionally, free access is available through platforms like Kilo Code and Open Router, making sure that developers with limited financial resources can still use its capabilities. To complement the model, Mistral AI has introduced the Mistral Vibe CLI, a command-line interface designed to simplify codebase exploration, modification, and execution. This tool enhances Devstral 2's capabilities by providing: By integrating these tools, developers can focus on higher-level problem-solving while automating routine coding processes. This combination of advanced functionality and user-friendly tools ensures that Devstral 2 can be seamlessly incorporated into existing workflows. While Devstral 2 excels in back-end development, it does have certain limitations. Its capabilities in front-end development, such as generating user interface elements or animations, are minimal. Additionally, the model's context window is capped at 256k tokens, which may pose challenges for extremely large codebases. However, for the majority of back-end applications, these constraints are unlikely to significantly impact performance. Both versions of Devstral 2 are available under open source licenses, encouraging widespread adoption and collaboration. The larger model operates under a modified MIT license, while the smaller version uses the Apache 2.0 license. Developers can access the models through Mistral's console, chatbot, or third-party integrations, making sure seamless deployment into existing systems. This flexibility makes it easier to incorporate Devstral 2 into diverse coding environments, whether for individual projects or enterprise-scale applications. Mistral Devstral 2 represents a significant step forward in the evolution of open source coding models. Its combination of high performance, cost-efficiency, and accessibility makes it a valuable resource for developers and enterprises alike. Whether you're automating debugging processes, modernizing legacy systems, or optimizing workflows, Devstral 2 equips you with the tools needed to excel in today's fast-paced development landscape. By addressing the challenges of back-end development with precision and efficiency, Devstral 2 is poised to shape the future of coding automation and innovation.
[6]
Devstral 2 and Vibe CLI explained: Mistral's bet on open weight coding AI
Mistral expands developer ecosystem through flexible, high context coding models Mistral is leaning into its identity as one of the strongest champions of open weight AI. With the release of Devstral 2 and the new Vibe CLI, the company is positioning itself as a serious contender in the coding assistant space, a market dominated so far by closed models and cloud locked tools. The launch marks a clear attempt to build a developer stack that is flexible, transparent, and capable of operating across entire codebases rather than isolated snippets. Also read: Agentic AI Foundation explained: Why Linux is joining OpenAI, Anthropic for future of AI Devstral 2 arrives in two variants. The headline model is a 123 billion parameter system built specifically for software development tasks. A smaller 24 billion parameter version offers similar behaviour with lower hardware demands, making it feasible to run locally. Both models share a major upgrade in context length, now extended to 256,000 tokens. That scale allows the model to scan long files, understand multi file structures, and maintain reasoning across large repositories. Benchmarks show significant gains. On the SWE bench Verified test, Devstral 2 reaches 72.2 percent, which places it among the most capable open weight models in coding tasks. The small model hits 68 percent, competitive enough to be practical for hobbyists and small teams who prefer offline use. Alongside the models, Mistral introduced Vibe CLI, a command line assistant that acts on entire codebases through natural language prompts. Instead of generating standalone snippets, Vibe works by reading the folder structure, Git status, and current files, then applying changes across multiple locations when needed. Also read: Australia bans social media for all below age 16: Tech law's key features explained Its tool set includes multi file refactoring, search and replace, code generation tied to existing architecture, dependency updates, and even shell commands. It is designed to run either in a terminal or inside an IDE, acting as a project aware companion that understands not just syntax but context. The approach reflects a shift from isolated code generation toward workflow automation, which aligns with how real world software projects evolve over time. Mistral is not claiming it has surpassed every proprietary coding model, but it is making an argument built on philosophy and practicality. Devstral 2 is available under permissive licensing, with the small model allowing full commercial use. This means teams can self host, fine tune, or integrate the model without vendor restrictions. For developers who work with sensitive codebases or want reproducibility, this openness becomes a major advantage. In user preference studies shared by the company, compact models from Mistral often matched or exceeded larger competitors on many tasks. While the top closed models still dominate in raw preference scores, Devstral aims to close the gap through efficiency, transparency, and cost friendliness. Also read: 5 reasons why I think Canon R6 Mark III is the right upgrade for creators this year
Share
Share
Copy Link
French AI startup Mistral AI released Devstral 2, a 123 billion parameter open-weights coding model designed for autonomous software engineering. The model achieves 72.2% on SWE-bench Verified, positioning it among top performers. Alongside the release, Mistral introduced Mistral Vibe CLI, a command-line interface enabling developers to automate coding tasks through natural language while maintaining project-aware context across entire codebases.
French AI startup Mistral AI released Devstral 2 on Tuesday, marking its ambitious entry into the vibe coding market with a 123 billion parameter open-weights coding model built for autonomous software engineering
1
. The AI coding model achieves a 72.2% score on SWE-bench Verified, a benchmark testing whether AI systems can solve real GitHub issues, placing it among the top-performing models in its category3
. This release follows closely on the heels of the Mistral 3 family launch, confirming the company's intent to close the gap with bigger AI labs like Anthropic, OpenAI, and Google2
.
Source: Geeky Gadgets
The term vibe coding, coined by AI researcher Andrej Karpathy in February 2025, describes a programming style where developers describe desired outcomes in natural language and accept AI-generated code without extensive review
1
. Collins Dictionary named it Word of the Year for 2025, reflecting its growing influence on software development practices.Alongside Devstral 2, Mistral AI introduced Mistral Vibe CLI, a command-line interface that enables developers to interact with Devstral models directly within their terminal environments
4
. The tool transforms natural language prompts into real-world code changes across entire codebases, rather than merely generating isolated code snippets. Released under the Apache 2.0 license, the CLI features persistent history and can scan file structures and Git status to maintain project-aware context across projects2
.
Source: TechCrunch
The Mistral Vibe CLI allows developers to reference files with @, run shell commands with !, and toggle behavior with slash commands
3
. It can orchestrate changes across multiple files, track framework dependencies, detect failures, retry with corrections, and handle tasks like bug fixing and modernizing legacy systems at repository scale1
. Unlike most developer agents that simulate interfaces from within chat UIs, Vibe starts with the shell and integrates intelligence directly into actual developer workflows.Mistral AI simultaneously released Devstral Small 2, a 24 billion parameter version that scores 68% on SWE-bench Verified while running locally on consumer hardware like laptops with no Internet connection required
1
. The smaller model can be deployed using an RTX 4090 GPU or a Mac with 32GB of RAM, making it accessible to individual developers and organizations without requiring cloud infrastructure5
. Both models support a 256,000 token context window, allowing them to process moderately large codebases1
.Despite its smaller footprint, Devstral Small 2 demonstrates performance that allows it to compete with models five times larger, making it the strongest open-weight model of its size
3
. This efficiency eliminates the latency associated with cloud-based AI systems while maintaining strong capabilities for production-grade workflows.Devstral 2 is currently free to use through Mistral AI's API during an initial period
1
. After the free period ends, API pricing will be $0.40 per million input tokens and $2.00 per million output tokens. Devstral Small 2 will cost $0.10 per million input tokens and $0.30 per million output tokens2
. Mistral AI claims this makes Devstral 2 approximately seven times more cost-efficient than Claude Sonnet at real-world tasks, with Anthropic's Sonnet 4.5 priced at $3 per million input tokens and $15 per million output tokens through the API1
.Free access is also available through platforms like Kilo Code and Open Router, ensuring developers with limited financial resources can leverage the model's capabilities
5
. Mistral AI has partnered with agent tools Kilo Code and Cline to release Devstral 2 to users, while Mistral Vibe CLI is available as an extension in Zed for use inside the IDE2
.The two models ship under different open-source licenses reflecting distinct use cases. Devstral Small 2 uses the Apache 2.0 license, a standard that imposes no revenue restrictions and allows enterprises to use it in production, embed it into products, and redistribute fine-tuned versions without requiring permission
3
. Devstral 2 operates under a modified MIT license that introduces a critical limitation: companies generating more than $20 million in monthly revenue cannot use the model without securing a separate commercial license from Mistral AI3
.This licensing structure allows enterprises to adopt a hybrid workflow, using the larger 123 billion parameter model for complex architectural planning tasks while deploying the smaller 24 billion parameter version for rapid, private code changes within their network boundaries
4
.Related Stories
SWE-bench Verified presents AI systems with 500 real software engineering problems pulled from GitHub issues in popular Python repositories, requiring models to read issue descriptions, navigate codebases, and generate working patches that pass unit tests
1
. While some researchers note that approximately 90% of benchmark tasks test relatively simple bug fixes that experienced engineers could complete in under an hour, employees at major AI companies pay close attention to SWE-bench Verified performance as one of the few standardized ways to compare coding models.
Source: Digit
In human evaluations, Devstral 2 beat DeepSeek V3.2 in 42.8% of tasks, losing only 28.6%, despite being five times smaller
3
. Against Claude Sonnet 4.5, it lost more often at 53.1%, indicating that closed proprietary models still lead in overall preference3
. The model is also eight times smaller than Kimi K2 while matching or surpassing it on key software reasoning benchmarks.Mistral AI launches Devstral 2 as competition in the vibe coding segment intensifies, with major players building increasingly closed ecosystems . Google recently announced a partnership with Replit Inc., bundling its Gemini 3 Pro model with the new AntiGravity IDE and cloud compute into a single stack designed to lock developers into its ecosystem. Anthropic is reportedly building a dedicated runtime for its coding agents following its acquisition of the JavaScript toolkit Bun .
Developer Simon Willison, speaking to Ars Technica in March, offered a balanced perspective on vibe coding: "I really enjoy vibe coding. It's a fun way to try out an idea and prove if it can work." However, he warned that "vibe coding your way to a production codebase is clearly risky. Most of the work we do as software engineers involves evolving existing systems, where the quality and understandability of the underlying code is crucial"
1
.Mistral AI, currently valued at €11.7 billion (approximately $13.8 billion) following a Series C funding round led by Dutch semiconductor company ASML, which invested €1.3 billion (approximately $1.5 billion) in September, positions itself as Europe's champion AI lab
2
. With Devstral 2, the company provides developers an alternative that combines open weights with local deployment options, sitting firmly outside the walled gardens being constructed by many rivals. Both models are available now through Mistral's console, chatbot, or third-party integrations, with the full Devstral 2 model supported in the community inference provider vLLM and on the open-source agentic coding platform Kilo Code3
.Summarized by
Navi
[3]
[4]
22 May 2025•Technology

05 Jun 2025•Technology

17 Jul 2024

1
Business and Economy

2
Business and Economy

3
Policy and Regulation
