releases

New NVIDIA Nemotron 3 Super Delivers 5x Higher Throughput for Agentic AI

March 11, 2026 at 04:00 PMUpdated: Mar 201 Sources

TL;DR

NVIDIA launched Nemotron 3 Super, an open model with 120 billion total parameters but only 12 billion active ones, using a mixture-of-experts architecture. NVIDIA claims 5x higher throughput compared to dense models of similar scale, specifically targeting agentic AI workloads. Perplexity is among the first AI-native companies to offer users direct access to the model. The design prioritizes reasoning accuracy alongside low inference cost, aiming to make autonomous agent pipelines more economically viable.

Nauti's Take

5x throughput sounds like marketing magic, but the underlying MoE logic makes the claim at least plausible – as long as NVIDIA keeps the benchmarks transparent rather than cherry-picking scenarios. More interesting than the raw number is the strategic signal: NVIDIA wants to become the default stack for agentic AI, from GPU to model layer.

The open release simultaneously feeds the ecosystem that needs NVIDIA hardware to shine. Smart move – but also genuine value for developers who finally get a strong, open reasoning model built for agent workloads.

Briefingshow

The MoE architecture is no coincidence: agentic systems run many parallel inference calls, and costs plus latency stack up fast. A model that activates only 10% of its parameters while delivering 120B-level quality materially changes the economics of entire agent stacks. Anyone building agentic pipelines today needs Nemotron 3 Super on their radar – especially since NVIDIA is releasing it openly, making self-hosting a real option.

Sources

11.3.26

New NVIDIA Nemotron 3 Super Delivers 5x Higher Throughput for Agentic AI

#agents #reasoning #nvidia

TL;DR

Nauti's Take

Sources

Related stories

From Our Newsletter