1 / 161

New NVIDIA Nemotron 3 Super Delivers 5x Higher Throughput for Agentic AI

TL;DR

NVIDIA launched Nemotron 3 Super, an open model with 120 billion total parameters but only 12 billion active ones, using a mixture-of-experts architecture.

Key Points

  • NVIDIA claims 5x higher throughput compared to dense models of similar scale, specifically targeting agentic AI workloads.
  • Perplexity is among the first AI-native companies to offer users direct access to the model.
  • The design prioritizes reasoning accuracy alongside low inference cost, aiming to make autonomous agent pipelines more economically viable.

Nauti's Take

5x throughput sounds like marketing magic, but the underlying MoE logic makes the claim at least plausible – as long as NVIDIA keeps the benchmarks transparent rather than cherry-picking scenarios. More interesting than the raw number is the strategic signal: NVIDIA wants to become the default stack for agentic AI, from GPU to model layer.

The open release simultaneously feeds the ecosystem that needs NVIDIA hardware to shine. Smart move – but also genuine value for developers who finally get a strong, open reasoning model built for agent workloads.

Sources