9 Sources
9 Sources
[1]
Nvidia launches Nemotron 3 Super to power enterprise AI agents
The 120B parameter model aims to improve compute efficiency and accuracy for complex multi-agent workloads such as software development and cybersecurity triage. Nvidia has introduced a new reasoning-focused AI model that combines multiple neural network architectures in a bid to improve how enterprise systems handle complex tasks and automation. The company said its Nemotron 3 Super model combines Mamba sequence modeling, transformer attention, and Mixture-of-Experts routing to support so-called "agentic" AI systems that can plan and execute multi-step workflows across enterprise applications. In a statement, Nvidia said multi-agent systems can generate up to 15 times more tokens than standard chat interactions. This can lead to "context explosion," which may cause agents to drift from the original goal and raise costs, as large reasoning models are used for each subtask.
[2]
New NVIDIA Nemotron 3 Super Delivers 5x Higher Throughput for Agentic AI
A new, open, 120-billion-parameter hybrid mixture-of-experts model optimized for NVIDIA Blackwell addresses the costs of long thinking and context explosion that slow autonomous agent workflows. Launched today, NVIDIA Nemotron 3 Super is a 120‑billion‑parameter open model with 12 billion active parameters designed to run complex agentic AI systems at scale. Available now, the model combines advanced reasoning capabilities to efficiently complete tasks with high accuracy for autonomous agents. AI-Native Companies: Perplexity offers its users access to Nemotron 3 Super for search and as one of 20 orchestrated models in Computer. Companies offering software development agents like CodeRabbit, Factory and Greptile are integrating the model into their AI agents along with proprietary models to achieve higher accuracy at lower cost. And life sciences and frontier AI organizations like Edison Scientific and Lila Sciences will power their agents for deep literature search, data science and molecular understanding. Enterprise Software Platforms: Industry leaders such as Amdocs, Palantir, Cadence, Dassault Systèmes and Siemens are deploying and customizing the model to automate workflows in telecom, cybersecurity, semiconductor design and manufacturing. As companies move beyond chatbots and into multi‑agent applications, they encounter two constraints. The first is context explosion. Multi‑agent workflows generate up to 15x more tokens than standard chat because each interaction requires resending full histories, including tool outputs and intermediate reasoning. Over long tasks, this volume of context increases costs and can lead to goal drift, where agents lose alignment with the original objective. The second is the thinking tax. Complex agents must reason at every step, but using large models for every subtask makes multi-agent applications too expensive and sluggish for practical applications. Nemotron 3 Super has a 1‑million‑token context window, allowing agents to retain full workflow state in memory and preventing goal drift. Nemotron 3 Super has set new standards, claiming the top spot on Artificial Analysis for efficiency and openness with leading accuracy among models of the same size. The model also powers the NVIDIA AI-Q research agent to the No. 1 position on DeepResearch Bench and DeepResearch Bench II leaderboards, benchmarks that measure an AI system's ability to conduct thorough, multistep research across large document sets while maintaining reasoning coherence. Hybrid Architecture Nemotron 3 Super uses a hybrid mixture‑of‑experts (MoE) architecture that combines three major innovations to deliver up to 5x higher throughput and up to 2x higher accuracy than the previous Nemotron Super model. * Hybrid Architecture: Mamba layers deliver 4x higher memory and compute efficiency, while transformer layers drive advanced reasoning. * MoE: Only 12 billion of its 120 billion parameters are active at inference. * Latent MoE: A new technique that improves accuracy by activating four expert specialists for the cost of one to generate the next token at inference. * Multi-Token Prediction: Predicts multiple future words simultaneously, resulting in 3x faster inference. On the NVIDIA Blackwell platform, the model runs in NVFP4 precision. That cuts memory requirements and pushes inference up to 4x faster than FP8 on NVIDIA Hopper, with no loss in accuracy. Open Weights, Data and Recipes NVIDIA is releasing Nemotron 3 Super with open weights under a permissive license. Developers can deploy and customize it on workstations, in data centers or in the cloud. The model was trained on synthetic data generated using frontier reasoning models. NVIDIA is publishing the complete methodology, including over 10 trillion tokens of pre- and post-training datasets, 15 training environments for reinforcement learning and evaluation recipes. Researchers can further use the NVIDIA NeMo platform to fine-tune the model or build their own. Use in Agentic Systems Nemotron 3 Super is designed to handle complex subtasks inside a multi-agent system. A software development agent can load an entire codebase into context at once, enabling end-to-end code generation and debugging without document segmentation. In financial analysis it can load thousands of pages of reports into memory, eliminating the need to re-reason across long conversations, which improves efficiency. Nemotron 3 Super has high-accuracy tool calling that ensures autonomous agents reliably navigate massive function libraries to prevent execution errors in high-stakes environments, like autonomous security orchestration in cybersecurity. Availability NVIDIA Nemotron 3 Super, part of the Nemotron 3 family, can be accessed at build.nvidia.com, Perplexity, OpenRouter and Hugging Face. Dell Technologies is bringing the model to the Dell Enterprise Hub on Hugging Face, optimized for on-premise deployment on the Dell AI Factory, advancing multi-agent AI workflows. HPE is also bringing NVIDIA Nemotron to its agents hub to help ensure scalable enterprise adoption of agentic AI. Enterprises and developers can deploy the model through several partners: * Cloud Service Providers: Google Cloud's Vertex AI and Oracle Cloud Infrastructure, and coming soon to Amazon Web Services through Amazon Bedrock as well as Microsoft Azure. * NVIDIA Cloud Partners: Coreweave, Crusoe, Nebius and Together AI. * Inference Service Providers: Baseten, CloudFlare, DeepInfra, Fireworks AI, Inference.net, Lightning AI, Modal and FriendliAI. * Data Platforms and Services: Distyl, Dataiku, DataRobot, Deloitte, EY and Tata Consultancy Services. The model is packaged as an NVIDIA NIM microservice, allowing deployment from on-premises systems to the cloud. Stay up to date on agentic AI, NVIDIA Nemotron and more by subscribing to NVIDIA AI news, joining the community, and following NVIDIA AI on LinkedIn, Instagram, X and Facebook.
[3]
Nvidia's new open weights Nemotron 3 super combines three different architectures to beat gpt-oss and Qwen in throughput
Multi-agent systems, designed to handle long-horizon tasks like software engineering or cybersecurity triaging, can generate up to 15 times the token volume of standard chats -- threatening their cost-effectiveness in handling enterprise tasks. But today, Nvidia sought to help solve this problem with the release of Nemotron 3 Super, a 120-billion-parameter hybrid model, with weights posted on Hugging Face. By merging disparate architectural philosophies -- state-space models, transformers, and a novel "Latent" mixture-of-experts design -- Nvidia is attempting to provide the specialized depth required for agentic workflows without the bloat typical of dense reasoning models, and all available for commercial usage under mostly open weights. At the core of Nemotron 3 Super is a sophisticated architectural triad that balances memory efficiency with precision reasoning. The model utilizes a Hybrid Mamba-Transformer backbone, which interleaves Mamba-2 layers with strategic Transformer attention layers. To understand the implications for enterprise production, consider the "needle in a haystack" problem. Mamba-2 layers act like a "fast-travel" highway system, handling the vast majority of sequence processing with linear-time complexity. This allows the model to maintain a massive 1-million-token context window without the memory footprint of the KV cache exploding. However, pure state-space models often struggle with associative recall. To fix this, Nvidia strategically inserts Transformer attention layers as "global anchors," ensuring the model can precisely retrieve specific facts buried deep within a codebase or a stack of financial reports. Beyond the backbone, the model introduces Latent Mixture-of-Experts (LatentMoE). Traditional Mixture-of-Experts (MoE) designs route tokens to experts in their full hidden dimension, which creates a computational bottleneck as models scale. LatentMoE solves this by projecting tokens into a compressed space before routing them to specialists. This "expert compression" allows the model to consult four times as many specialists for the exact same computational cost. This granularity is vital for agents that must switch between Python syntax, SQL logic, and conversational reasoning within a single turn. Further accelerating the model is Multi-Token Prediction (MTP). While standard models predict a single next token, MTP predicts several future tokens simultaneously. This serves as a "built-in draft model," enabling native speculative decoding that can deliver up to 3x wall-clock speedups for structured generation tasks like code or tool calls. For enterprises, the most significant technical leap in Nemotron 3 Super is its optimization for the Nvidia Blackwell GPU platform. By pre-training natively in NVFP4 (4-bit floating point), Nvidia has achieved a breakthrough in production efficiency. On Blackwell, the model delivers 4x faster inference than 8-bit models running on the previous Hopper architecture, with no loss in accuracy. In practical performance, Nemotron 3 Super is a specialized tool for agentic reasoning. It currently holds the No. 1 position on the DeepResearch Bench, a benchmark measuring an AI's ability to conduct thorough, multi-step research across large document sets. It also demonstrates significant throughput advantages, achieving up to 2.2x higher throughput than gpt-oss-120B and 7.5x higher than Qwen3.5-122B in high-volume settings. The release of Nemotron 3 Super under the Nvidia Open Model License Agreement (updated October 2025) provides a permissive framework for enterprise adoption, though it carries distinct "safeguard" clauses that differentiate it from pure open-source licenses like MIT or Apache 2.0. The license includes two critical termination triggers that production teams must monitor: This structure allows Nvidia to foster a commercial ecosystem while protecting itself from "IP trolling" and ensuring that the model isn't stripped of its safety features for malicious use. The release has generated significant buzz within the developer community. Chris Alexiuk, a Senior Product Research Enginner at Nvidia, heralded the launch on X under his handle @llm_wizard as a "SUPER DAY," emphasizing the model's speed and transparency. "Model is: FAST. Model is: SMART. Model is: THE MOST OPEN MODEL WE'VE DONE YET," Chris posted, highlighting the release of not just weights, but 10 trillion tokens of training data and recipes. The industry adoption reflects this enthusiasm: As Kari Briski, Nvidia VP of AI Software, noted: "As companies move beyond chatbots and into multi-agent applications, they encounter... context explosion." Nemotron 3 Super is Nvidia's answer to that explosion -- a model that provides the "brainpower" of a 120B parameter system with the operational efficiency of a much smaller specialist. For the enterprise, the message is clear: the "thinking tax" is finally coming down.
[4]
Nvidia Drops Nemotron 3 Super Amid $26 Billion Open-Model AI Bet -- America's Answer to Qwen? - Decrypt
Nvidia's $26 billion investment into open-source AI wants to counter China's rise in the field. Nvidia just shipped Nemotron 3 Super, a 120-billion-parameter open-weight model built to do one thing well: run autonomous AI agents without bleeding your compute budget dry. That's not a small problem. Multi-agent systems generate a lot more tokens than a normal chat -- every tool call, reasoning step, and slice of context gets re-sent from scratch. As a result, costs explode, models tend to drift, and the agents slowly forget what they were supposed to be doing in the first place... or at least decrease in accuracy. Nemotron 3 Super is Nvidia's answer to all of that. The model runs 12 billion active parameters out of 120 billion total, using a mixture-of-experts (MoE) design that keeps inference cheap while retaining the reasoning depth complex workflows need. It packs a 1-million-token context window, so agents can hold an entire codebase, or nearly 750,000 words in memory before collapsing. To build its model, Nvidia combined three components that rarely appear together in the same architecture: Mamba-2 state-space layers -- a faster, memory-efficient alternative to attention for handling long token streams -- along with Transformer attention layers for precise recall, and a new "Latent MoE" design that compresses token embeddings before routing them to experts. That allows the model to activate four times as many specialists at the same compute cost. The model was also pretrained natively in NVFP4, Nvidia's 4-bit floating-point format. In practice, that means the system learned to operate accurately within 4-bit arithmetic from the very first gradient update, rather than being trained at high precision and compressed afterward, which often causes models to lose accuracy. For context, a model's precision is measured in bits. Full precision, known as FP32, is the gold standard -- but it is also extremely expensive to run at scale. Developers often reduce precision to save compute while trying to preserve useful performance. Think of it like shrinking a 4K image down to 1080p: The picture still looks the same at a glance, just with less detail. Normally, dropping from 32-bit precision all the way to 4-bit would cripple a model's reasoning ability. Nemotron avoids that problem by learning to operate at low precision from the start, instead of being squeezed into it later. Compared to its own predecessor, Nemotron 3 Super delivers more than five times the throughput. Against external rivals, it's 2.2x faster than OpenAI's GPT-OSS 120B on inference throughput, and 7.5x faster than Alibaba's Qwen3.5-122B. We ran our own quick test. The reasoning held up well, including on prompts that were deliberately vague, badly worded, or based on wrong information. The model caught small errors in context without being asked to, handled math and logic problems cleanly, and didn't fall apart when the question itself was slightly off. The full training pipeline is public: weights on Hugging Face, 10 trillion curated pretraining tokens seen over 25 trillion total during training, 40 million post-training samples, and reinforcement learning recipes across 21 environment configurations. Perplexity, Palantir, Cadence, and Siemens are already integrating the model in their workflows. The model may be one piece of a larger strategy. A 2025 financial filing shows Nvidia plans to spend $26 billion over the next five years building open-weight AI models. Executives confirmed it, too. Bryan Catanzaro, VP of applied deep learning research, told Wired the company recently finished pretraining a 550-billion-parameter model. Nvidia released its first Nemotron model back in November 2023, but that filing makes clear this is no longer a side project. The investment is strategic considering Nvidia's chips are still the default infrastructure for training and running frontier models. Models tuned to its hardware give customers a built-in reason to stay on Nvidia despite efforts from competitors to use other hardware. But there's a more urgent pressure behind the move: America is losing the open-source AI race, and losing it fast. Chinese open models went from barely 1.2% of global open-model usage in late 2024 to roughly 30% by the end of 2025, according to research by OpenRouter and Andreessen Horowitz. Alibaba's Qwen overtook Meta's Llama as the most-used self-hosted open-source model, according to Runpod. American companies including Airbnb adopted it for customer service. Startups worldwide are building on top of it. Beyond market share, that kind of adoption creates infrastructure dependencies that are hard to reverse. While U.S. giants like OpenAI, Anthropic, and Google keep their best models locked behind APIs, Chinese labs from DeepSeek to Alibaba have been flooding the open ecosystem. Meta was the one major American player competing in open source with Llama, but Zuckerberg recently signaled the company might not make future models fully open. The gap between "best proprietary model" and "best open model" used to be massive -- and in America's favor. That gap is now very small, and the open side of the ledger is increasingly Chinese. There's also a hardware threat underneath all of this. A new DeepSeek model is widely expected to drop soon, and it's rumored to have been trained entirely on chips made by Huawei -- a sanctioned Chinese company. If that's confirmed, then it would give developers around the world, particularly in China, a concrete reason to start testing Huawei's hardware. China's Ziphu AI is already doing that. That's the scenario Nvidia most needs to prevent: Chinese open models and Chinese chips building an ecosystem that doesn't need Nvidia at all.
[5]
Nvidia's Nemotron Super 3 model for agentic systems launches with five times higher throughput - SiliconANGLE
Nvidia's Nemotron Super 3 model for agentic systems launches with five times higher throughput With so much talk about its upcoming Vera Rubin graphics processing units, it's easy to forget that Nvidia Corp. doesn't just supply the hardware for artificial intelligence. It also develops its own series of AI models, and today it announced the availability of its most capable model so far. The company said Nemotron Super 3 is aimed at running complex agentic AI systems at large scale, combining advanced reasoning skills with rapid processing speeds to efficiently perform tasks that require extreme accuracy. Nemotron Super 3 is a 120 billion-parameter open model based on a hybrid mixture-of-experts architecture. It combines three innovations to achieve up to five times higher throughput and twice the accuracy of the previous-generation Nemotron Super model, Nvidia said. According to Nvidia, Nemotron Super 3 is designed to tackle two major constraints facing agentic AI systems that aim to automate complex tasks on behalf of their users. The first is an explosion of content. Nvidia said that multi-agent workflows typically generate up to 15-times more tokens than standard chat interactions, because each time a user interacts with one, the model needs to resend context including tool outputs and intermediate reasoning. The second constraint is known as the "thinking tax". Complex agents must reason at each step of a task they complete, which means it's impractical to use much larger models - the more parameters there are, the more expensive it becomes to process things. LLMs are also slower than smaller models. To get around these problems, Nemotron 3 Super has a one million token context window that allows it to retain full workflow state in memory and prevent "goal drift," Nvidia said. Moreover, only 12 billion of its 120 billion parameters are active during inference, which is the process of running trained models to generate predictions or produce conclusions on new, unseen data. Nvidia said Nemotron Super 3 runs in NVFP4 precision on its Blackwell GPUs, which allows it to reduce its memory requirements and speed up inference by up to four-times what can be achieved on its previous-generation Hopper platform. Nemotron 3 Super is extremely accessible, and can be downloaded from build.nvidia.com, OpenRouter and Hugging Face. In addition, the AI search engine Perplexity Inc. is making the model available in its search engine, and also with its "Computer" AI agent system. Generative AI coding applications such as CodeRabbit, Factory and Greptile are also adding the model to their lineups, while the life sciences organizations Edison Scientific and Lila Sciences will use it to power agents for data science, deep literature research and molecular understanding. Companies including Amdocs group Co., Palantir Technologies Inc., Cadence Design Systems Inc. and Dassault Systèmes SA are also using Nemotron Super 3 to automate workflows in telecommunications, cybersecurity, semiconductor design and manufacturing. Finally, Dell Technologies Inc. and Hewlett-Packard Enterprise Co. will also offer access to the model through their respective agent hubs, Nvidia said. The launch of Nemotron 3 Super comes ahead of Nvidia's annual GTC conference, which is set to kick off next week on March 16, where the company is expected to unveil its next generation GPU platforms.
[6]
Nvidia launches 120B parameter Nemotron 3 Super open model
Nvidia launched Nemotron 3 Super, a 120-billion-parameter open-weight model designed for large-scale agentic AI systems. The company announced the release on Wednesday, positioning the model for speed and efficiency. The model targets enterprise deployment for complex tasks requiring long-context processing. It is available immediately on several major platforms, expanding access for developers and researchers. Nemotron 3 Super is available on build.nvidia.com, Perplexity, OpenRouter, and Hugging Face. Enterprises can access it via Google Cloud's Vertex AI, Oracle Cloud Infrastructure, and soon on Amazon Bedrock and Microsoft Azure. The model is also accessible on Coreweave, Crusoe, Nebius, and Together AI. The model uses a hybrid latent mixture-of-experts and Mamaba-Transformer architecture. This structure allows it to use four times as many expert specialists during inference as previous models at the same cost. Nvidia trained the model on synthetic data from other frontier reasoning models. The company is publishing over 10 trillion tokens of training data and 15 training environments with this release. According to Artificial Analysis benchmarks, the model scores 36 for overall intelligence. This places it above gpt-oss-120B at 33 points but behind Gemini 3.1 Pro and GPT-5.4, which both score 57. Find the best AI model for your needs, check out our AI Model Leaderboard! The model achieves a speed of 478 output tokens per second, making it the fastest model available. Nvidia states it achieves 7.5x higher inference throughput than Qwen3.5-122B. Nvidia has not announced a release date for Nemotron 3 Ultra, the family's largest 500-billion-parameter model. The company teased the Ultra variant in its original announcement last year.
[7]
Nvidia's New Open-Source AI Model Is Designed for Agentic Workflows
Nvidia released a new open-source artificial intelligence (AI) designed to handle complex agentic workflows. Dubbed Nemotron 3 Super, it is a hybrid mixture-of-experts (MoE) model that combines advanced reasoning capabilities and is said to complete tasks with high accuracy for autonomous agents. The new model is already being deployed by several AI firms, including Perplexity, for its new agentic Computer platform. Additionally, it is also being hosted on public repositories to let interested individuals download and run the model locally. Nvidia's Nemotron 3 Super AI Model Released In a blog post, the tech giant announced and detailed the new open-source AI model. Part of the Nemotron 3 family, the Nemotron 3 Super is currently being hosted on Nvidia's website, Hugging Face platform, Perplexity, and OpenRouter. Additionally, it is also being brought to the Dell Enterprise Hub and is optimised for on-premise deployment on the Dell AI Factory. The latest model solves the problem of context and the increased cost of reasoning. AI models developed for agentic workflows tend to generate a higher number of tokens, as the interaction of each agent or sub-agent requires sending the full context. Similarly, executing complex tasks requires multi-level thinking, which can substantially drive up the costs of running the model. With its hybrid architecture, the Nemotron 3 Super comes with a total of 120 billion parameters and 12 billion active parameters. It also gets a context window of one million tokens, which allows agents to retain full workflow memory. Additionally, its development also utilised a technique dubbed Latent MoE, which improves accuracy by activating four experts for the cost of one to generate the next token at inference. The tech giant said it is releasing the open-source model with open weights under a permissive licence. On the dataset and training, the company says the Nemotron 3 Super was trained on synthetic data generated using frontier reasoning models. Nvidia said it is publishing the complete methodology, including more than 10 trillion tokens or pre and post-training datasets, 15 training environments for reinforcement learning and evaluation recipes.
[8]
NVIDIA Neatron 3 Super & Nemoclaw Target Safer AI Agents at Scale
Nvidia's recent unveiling of the Neatron 3 Super model and the Nemoclaw platform marks a significant moment in the evolution of AI agents. Covered in the video below by Julian Goldie, the Neatron 3 Super introduces a hybrid mixture of experts architecture, activating only 12 billion of its 120 billion parameters per task to balance efficiency and performance. Notably, its 1 million-token context window allows AI agents to handle complex, multi-step tasks without losing coherence, addressing a longstanding challenge in AI development. Paired with the Nemoclaw platform, which emphasizes security and scalability, these advancements aim to meet the growing demand for enterprise-grade AI solutions. Explore how these innovations enable faster responses through multi-token prediction, enhance enterprise adoption with hardware-agnostic compatibility and support the deployment of persistent AI agents for continuous automation. Gain insight into the implications of Nvidia's open source approach and how it fosters adaptability across industries. This feature outlines the key takeaways from Nvidia's contributions and what they mean for the future of AI-driven workflows. The Neatron 3 Super model represents a major step forward in AI architecture. With an impressive 120 billion parameters, it employs a hybrid mixture of experts design that activates only 12 billion parameters per task. This approach significantly reduces computational costs while maintaining high levels of performance, making it both efficient and scalable. Key features of the Neatron 3 Super include: These features position the Neatron 3 Super as a versatile and efficient tool for tasks ranging from natural language processing to autonomous decision-making, making it a valuable asset for businesses and developers alike. The Nemoclaw platform complements the Neatron 3 Super model by providing a secure and scalable environment for deploying AI agents. Designed with enterprise-grade security, it addresses vulnerabilities commonly associated with open source AI solutions, making sure robust protection for sensitive data and workflows. Notable features of Nemoclaw include: By combining security, scalability and hardware flexibility, Nemoclaw provides a reliable foundation for businesses aiming to implement advanced AI solutions in their operations. Check out more relevant guides from our extensive collection on NVIDIA that you might find useful. OpenClaw, an open source AI agent, has played a pivotal role in the development of specialized variants such as NanoClaw, ZeroClaw and IronClaw. These versions cater to specific needs, ranging from lightweight applications to high-performance tasks. However, concerns about security vulnerabilities in OpenClaw have prompted many enterprises to seek more secure alternatives, such as Nemoclaw. The open source AI ecosystem underscores the growing demand for customizable and scalable AI agents. Nvidia's contributions, including the Neatron 3 Super and Nemoclaw, address these demands while setting new standards for security, efficiency and adaptability. This dual approach ensures that both developers and enterprises have access to tools that meet their specific requirements. Nvidia's advancements have captured the attention of leading technology companies. Industry giants such as Google, Oracle and Salesforce are exploring partnerships to integrate AI agents into their operations, aiming to enhance productivity and streamline workflows. Meanwhile, Chinese technology leaders like Tencent and Alibaba are advancing their own AI agent initiatives, targeting both consumer and enterprise markets. The competitive landscape has intensified further with OpenAI's acquisition of OpenClaw, signaling a race to dominate the AI agent ecosystem. Nvidia's comprehensive approach, spanning hardware, models, platforms and benchmarks, positions it as a key player in this rapidly evolving field. Looking ahead, Nvidia's strategy focuses on building a robust AI stack that caters to the diverse needs of developers and enterprises. This stack includes: The shift from query-based AI, such as chatbots, to task-based AI, exemplified by autonomous agents, is transforming workflows across industries. Persistent AI agents require substantial computational resources, aligning with Nvidia's hardware solutions to ensure seamless integration and performance. As of March 2026, the AI agent ecosystem is poised for further growth and innovation. Nvidia's upcoming GTC 2026 conference is expected to showcase additional advancements, including the official launch of the Nemoclaw platform and potential new hardware developments. These announcements are likely to reinforce Nvidia's position as a leader in the AI space. The increasing accessibility of AI agents is set to accelerate their adoption across industries, driving automation and enhancing productivity. By automating repetitive tasks and allowing more efficient workflows, these technologies are reshaping industries and unlocking new opportunities for innovation. Nvidia's contributions, particularly the Neatron 3 Super and Nemoclaw platform, are at the forefront of this transformation, offering secure, efficient and scalable solutions for the future of AI-driven automation. Disclosure: Some of our articles include affiliate links. If you buy something through one of these links, Geeky Gadgets may earn an affiliate commission. Learn about our Disclosure Policy.
[9]
NVIDIA Unveils Nemotron 3 Super as an Open Agentic AI Model, and It Could Be the Perfect Choice for OpenClaw
NVIDIA's Nemotron class of open-source LLMs just got significantly enhanced with the latest release, Nemotron 3 Super, which now targets agentic AI workloads with its extensive context window. For those unaware, when we talk about the leading contributors to the world of open-source AI models, some might think of Chinese AI labs like Kimi or Qwen, but in reality, NVIDIA's Nemotron suite leads in this way. As AI is distributed across a "five-layer" cake, NVIDIA has not only dominated infrastructure and chips but is also one of the few in the West to have heavily invested in open-source models. With that, NVIDIA has now unveiled the Nemotron 3 Super, with the main idea being to run agentic AI applications at scale, making it ideal for agents like OpenClaw. One of the standout aspects of Nemotron 3 Super is NVIDIA's hybrid Mamba-MoE architecture. Compared to traditional MoE models, Mamba is a really impressive implementation. Essentially, NVIDIA has changed how an LLM interprets the data flow. With the newer architecture, Mamba relies on the State Space Model (SSM) to read data linearly, preventing a large context window from being built up and including irrelevant information. Mamba-MoE allows Nemotron 3 Super to maintain an optimal context window for user workloads, yielding the best agentic responses. - NVIDIA The Mamba layers deliver 4x higher memory efficiency and advanced reasoning, making Nemotron 3 Super ideal for inference workloads. Another impressive feature of Nemotron 3 Super is a 1-million-token context window, which is 4 times the size of the one in Kimi 2.5. There's a common law within agentic systems: the bigger the window, the better the response. This is why, from this aspect alone, Nemotron 3 Super dominates all other open-source LLMs and even comes close to the likes of Opus 4.5, despite being limited to just 120 billion parameters. Speaking of OpenClaw, NVIDIA tested Nemotron 3 Super on PinchBench, a suite used to evaluate agent workloads, and the model scored 85.6% across the full test suite, surpassing Opus 4.5, Kimi 2.5, and GPT-OSS 120b. For consumers running extensive workloads through OpenClaw, Nemotron 3 Super has opened up an entirely new class of performance, with compute power requirements that can be met with just a single GPU. Nemotron 3 Super is just an example of how extensive agentic AI systems would actually become moving ahead, and interestingly, LLMs are now overcoming compute limitations as well, which is why the future of model deployment on edge is brighter than ever.
Share
Share
Copy Link
Nvidia unveiled Nemotron 3 Super, a 120-billion-parameter open model designed to run complex agentic AI systems at scale. The model combines Mamba, Transformer attention, and Latent MoE architectures to deliver up to 5x higher throughput and 2x better accuracy than its predecessor. It addresses context explosion and thinking tax challenges that plague multi-agent workflows, which generate up to 15x more tokens than standard chat interactions.

Nvidia has released Nemotron 3 Super, a 120-billion-parameter model specifically engineered to handle the computational demands of AI agents operating across enterprise environments. The model addresses two critical bottlenecks that have hindered agentic AI workflows: context explosion and what the company calls the "thinking tax."
1
Multi-agent systems can generate up to 15 times more tokens than standard chat interactions, as each step requires resending full histories including tool outputs and intermediate reasoning.2
This volume increases costs dramatically and can cause agents to drift from their original objectives during long-horizon tasks like software development or cybersecurity triage.At the core of Nemotron 3 Super lies a sophisticated hybrid mixture-of-experts design that merges three distinct architectural innovations. The model interleaves Mamba-2 state-space layers with Transformer attention layers, allowing it to maintain a 1-million-token context window without the memory footprint explosion typical of pure attention mechanisms.
3
Mamba layers deliver 4x higher memory and compute efficiency by handling sequence processing with linear-time complexity, while strategically placed Transformer attention layers ensure precise retrieval of specific facts buried deep within codebases or financial reports. Only 12 billion of its 120 billion parameters activate during inference, keeping computational costs manageable while retaining the reasoning depth required for complex workflows.2
The model introduces Latent MoE, a novel technique that projects tokens into compressed space before routing them to specialists. This expert compression allows the system to consult four times as many specialists for the same computational cost as traditional mixture-of-experts designs.
3
Multi-Token Prediction further accelerates performance by predicting several future tokens simultaneously, delivering up to 3x wall-clock speedups for structured generation tasks.2
Compared to its predecessor, Nemotron 3 Super achieves up to 5x higher throughput and 2x better accuracy.Nvidia optimized Nemotron 3 Super specifically for its Blackwell GPU platform, pretraining the model natively in NVFP4, a 4-bit floating-point format. This approach differs fundamentally from conventional methods that train models at high precision and compress them afterward, which often degrades accuracy.
4
By learning to operate within 4-bit arithmetic from the first gradient update, the model maintains accuracy while cutting memory requirements. On Blackwell GPUs, the model runs up to 4x faster than 8-bit models on the previous Hopper architecture with no loss in accuracy.2
In practical benchmarks, Nemotron 3 Super demonstrates significant advantages over competing open model AI systems. It achieves 2.2x higher throughput than GPT-OSS-120B and 7.5x higher than Qwen3.5-122B in high-volume settings.
3
The model currently holds the top position on DeepResearch Bench and DeepResearch Bench II leaderboards, which measure an AI system's ability to conduct thorough, multistep research across large document sets while maintaining reasoning coherence.2
Perplexity has integrated Nemotron 3 Super into its search engine and Computer AI agent system, offering users access to the model as one of 20 orchestrated models.
2
Software development platforms including CodeRabbit, Factory, and Greptile are deploying the model alongside proprietary systems to achieve higher accuracy at lower cost for their AI agents. Life sciences organizations Edison Scientific and Lila Sciences will leverage the model for deep literature search, data science, and molecular understanding.Enterprise AI applications are seeing rapid adoption across telecommunications, cybersecurity, and manufacturing sectors. Palantir, Amdocs, Cadence Design Systems, Dassault Systèmes, and Siemens are customizing the model to automate workflows in their respective domains.
5
Dell Technologies is bringing the model to the Dell Enterprise Hub on Hugging Face, optimized for on-premise deployment on the Dell AI Factory.2
Related Stories
Nvidia released Nemotron 3 Super with open weights under the Nvidia Open Model License Agreement, providing a permissive framework for commercial use with specific safeguard clauses. The company published the complete training methodology, including over 10 trillion tokens of pre- and post-training datasets, 15 training environments for reinforcement learning, and evaluation recipes.
2
Developers can access the model through build.nvidia.com, Perplexity, OpenRouter, and Hugging Face.5
This release forms part of a larger strategic initiative. A 2025 financial filing reveals Nvidia plans to invest $26 billion over the next five years building open-weight AI models.
4
Bryan Catanzaro, VP of applied deep learning research, confirmed the company recently finished pretraining a 550-billion-parameter model. The investment responds to shifting dynamics in the open-source AI landscape, where Chinese open models increased from 1.2% of global open-model usage in late 2024 to approximately 30% by the end of 2025, according to research by OpenRouter and Andreessen Horowitz. Alibaba's Qwen overtook Meta's Llama as the most-used self-hosted open-source model.4
The launch of Nemotron 3 Super signals a shift in how enterprises can approach autonomous agent deployment. A software development agent can now load an entire codebase into context at once, enabling end-to-end code generation and debugging without document segmentation. In financial analysis, the model can process thousands of pages of reports in memory, eliminating the need to re-reason across long conversations and improving compute efficiency.
2
High-accuracy tool calling ensures autonomous agents reliably navigate massive function libraries, preventing execution errors in high-stakes environments like autonomous security orchestration in cybersecurity operations.As Kari Briski, Nvidia VP of AI Software, noted, companies moving beyond chatbots into multi-agent applications encounter significant technical constraints.
3
Nemotron 3 Super's architecture provides the reasoning capability of a 120-billion-parameter system with the operational efficiency of a much smaller specialist, effectively reducing the thinking tax that has made complex agentic AI workflows impractical for many production environments.Summarized by
Navi
[3]
15 Dec 2025•Technology

19 Mar 2025•Technology

07 Jan 2025•Technology

1
Science and Research

2
Technology

3
Policy and Regulation
