692 / 842

New NVIDIA Nemotron 3 Super Delivers 5x Higher Throughput for Agentic AI

TL;DR

NVIDIA launched Nemotron 3 Super, an open model with 120 billion total parameters but only 12 billion active ones, using a mixture-of-experts architecture.

Key Points

  • NVIDIA claims 5x higher throughput compared to dense models of similar scale, specifically targeting agentic AI workloads.
  • Perplexity is among the first AI-native companies to offer users direct access to the model.
  • The design prioritizes reasoning accuracy alongside low inference cost, aiming to make autonomous agent pipelines more economically viable.

Nauti's Take

5x throughput sounds like marketing magic, but the underlying MoE logic makes the claim at least plausible – as long as NVIDIA keeps the benchmarks transparent rather than cherry-picking scenarios. More interesting than the raw number is the strategic signal: NVIDIA wants to become the default stack for agentic AI, from GPU to model layer.

The open release simultaneously feeds the ecosystem that needs NVIDIA hardware to shine. Smart move – but also genuine value for developers who finally get a strong, open reasoning model built for agent workloads.

Context

The MoE architecture is no coincidence: agentic systems run many parallel inference calls, and costs plus latency stack up fast. A model that activates only 10% of its parameters while delivering 120B-level quality materially changes the economics of entire agent stacks. Anyone building agentic pipelines today needs Nemotron 3 Super on their radar – especially since NVIDIA is releasing it openly, making self-hosting a real option.

Sources