Nvidia launches Nemotron 3 Super with 5x throughput boost to power enterprise AI agents

Reviewed byNidhi Govil

9 Sources

Share

Nvidia unveiled Nemotron 3 Super, a 120-billion-parameter open model designed to run complex agentic AI systems at scale. The model combines Mamba, Transformer attention, and Latent MoE architectures to deliver up to 5x higher throughput and 2x better accuracy than its predecessor. It addresses context explosion and thinking tax challenges that plague multi-agent workflows, which generate up to 15x more tokens than standard chat interactions.

News article

Nvidia Tackles Multi-Agent Complexity With New Open Model

Nvidia has released Nemotron 3 Super, a 120-billion-parameter model specifically engineered to handle the computational demands of AI agents operating across enterprise environments. The model addresses two critical bottlenecks that have hindered agentic AI workflows: context explosion and what the company calls the "thinking tax."

1

Multi-agent systems can generate up to 15 times more tokens than standard chat interactions, as each step requires resending full histories including tool outputs and intermediate reasoning.

2

This volume increases costs dramatically and can cause agents to drift from their original objectives during long-horizon tasks like software development or cybersecurity triage.

Hybrid Architecture Delivers 5x Higher Throughput

At the core of Nemotron 3 Super lies a sophisticated hybrid mixture-of-experts design that merges three distinct architectural innovations. The model interleaves Mamba-2 state-space layers with Transformer attention layers, allowing it to maintain a 1-million-token context window without the memory footprint explosion typical of pure attention mechanisms.

3

Mamba layers deliver 4x higher memory and compute efficiency by handling sequence processing with linear-time complexity, while strategically placed Transformer attention layers ensure precise retrieval of specific facts buried deep within codebases or financial reports. Only 12 billion of its 120 billion parameters activate during inference, keeping computational costs manageable while retaining the reasoning depth required for complex workflows.

2

The model introduces Latent MoE, a novel technique that projects tokens into compressed space before routing them to specialists. This expert compression allows the system to consult four times as many specialists for the same computational cost as traditional mixture-of-experts designs.

3

Multi-Token Prediction further accelerates performance by predicting several future tokens simultaneously, delivering up to 3x wall-clock speedups for structured generation tasks.

2

Compared to its predecessor, Nemotron 3 Super achieves up to 5x higher throughput and 2x better accuracy.

Blackwell GPUs Enable 4x Faster Inference

Nvidia optimized Nemotron 3 Super specifically for its Blackwell GPU platform, pretraining the model natively in NVFP4, a 4-bit floating-point format. This approach differs fundamentally from conventional methods that train models at high precision and compress them afterward, which often degrades accuracy.

4

By learning to operate within 4-bit arithmetic from the first gradient update, the model maintains accuracy while cutting memory requirements. On Blackwell GPUs, the model runs up to 4x faster than 8-bit models on the previous Hopper architecture with no loss in accuracy.

2

In practical benchmarks, Nemotron 3 Super demonstrates significant advantages over competing open model AI systems. It achieves 2.2x higher throughput than GPT-OSS-120B and 7.5x higher than Qwen3.5-122B in high-volume settings.

3

The model currently holds the top position on DeepResearch Bench and DeepResearch Bench II leaderboards, which measure an AI system's ability to conduct thorough, multistep research across large document sets while maintaining reasoning coherence.

2

Enterprise Adoption Spans Multiple Industries

Perplexity has integrated Nemotron 3 Super into its search engine and Computer AI agent system, offering users access to the model as one of 20 orchestrated models.

2

Software development platforms including CodeRabbit, Factory, and Greptile are deploying the model alongside proprietary systems to achieve higher accuracy at lower cost for their AI agents. Life sciences organizations Edison Scientific and Lila Sciences will leverage the model for deep literature search, data science, and molecular understanding.

Enterprise AI applications are seeing rapid adoption across telecommunications, cybersecurity, and manufacturing sectors. Palantir, Amdocs, Cadence Design Systems, Dassault Systèmes, and Siemens are customizing the model to automate workflows in their respective domains.

5

Dell Technologies is bringing the model to the Dell Enterprise Hub on Hugging Face, optimized for on-premise deployment on the Dell AI Factory.

2

Open Weights Release With $26 Billion Strategic Investment

Nvidia released Nemotron 3 Super with open weights under the Nvidia Open Model License Agreement, providing a permissive framework for commercial use with specific safeguard clauses. The company published the complete training methodology, including over 10 trillion tokens of pre- and post-training datasets, 15 training environments for reinforcement learning, and evaluation recipes.

2

Developers can access the model through build.nvidia.com, Perplexity, OpenRouter, and Hugging Face.

5

This release forms part of a larger strategic initiative. A 2025 financial filing reveals Nvidia plans to invest $26 billion over the next five years building open-weight AI models.

4

Bryan Catanzaro, VP of applied deep learning research, confirmed the company recently finished pretraining a 550-billion-parameter model. The investment responds to shifting dynamics in the open-source AI landscape, where Chinese open models increased from 1.2% of global open-model usage in late 2024 to approximately 30% by the end of 2025, according to research by OpenRouter and Andreessen Horowitz. Alibaba's Qwen overtook Meta's Llama as the most-used self-hosted open-source model.

4

Implications for Agentic AI Development

The launch of Nemotron 3 Super signals a shift in how enterprises can approach autonomous agent deployment. A software development agent can now load an entire codebase into context at once, enabling end-to-end code generation and debugging without document segmentation. In financial analysis, the model can process thousands of pages of reports in memory, eliminating the need to re-reason across long conversations and improving compute efficiency.

2

High-accuracy tool calling ensures autonomous agents reliably navigate massive function libraries, preventing execution errors in high-stakes environments like autonomous security orchestration in cybersecurity operations.

As Kari Briski, Nvidia VP of AI Software, noted, companies moving beyond chatbots into multi-agent applications encounter significant technical constraints.

3

Nemotron 3 Super's architecture provides the reasoning capability of a 120-billion-parameter system with the operational efficiency of a much smaller specialist, effectively reducing the thinking tax that has made complex agentic AI workflows impractical for many production environments.

Today's Top Stories

TheOutpost.ai

Your Daily Dose of Curated AI News

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

© 2026 Triveous Technologies Private Limited
Instagram logo
LinkedIn logo