661 / 935

With Nvidia Groq 3, the Era of AI Inference Is (Probably) Here

TL;DR

At GTC in San Jose (30,000+ attendees), Nvidia CEO Jensen Huang unveiled the Vera Rubin chip line — Nvidia's first chip designed specifically for AI inference.

Key Points

  • The Nvidia Groq 3 LPU (language processing unit) incorporates IP licensed from startup Groq for US $20 billion, a deal struck on Christmas Eve 2024.
  • Huang declared the inflection point of inference has arrived: AI must now 'think' and 'do,' both of which require inference rather than training workloads.
  • Training and inference demand distinct computational profiles; Nvidia has historically focused on training hardware.

Nauti's Take

Nvidia spends $20 billion licensing Groq's IP, brands the result 'Groq 3,' and calls it an inflection point — which is both a smart competitive move and an implicit admission that its training-focused architecture needed help for the inference era. Huang's soundbites about AI needing to 'think' and 'do' are vintage GTC theater, but the underlying point is real: inference is where the volume and the margin will be for the next decade.

The more interesting story is what this means for Groq as a standalone company and for AMD and Intel, who now face a Nvidia that has shored up its last obvious weakness. The inference wars just got a lot more expensive.

Context

For years, training dominated the AI hardware conversation — whoever builds the biggest clusters wins. But real-world AI deployment runs almost entirely on inference: millions of requests per second where latency and cost matter enormously. Nvidia entering the dedicated inference hardware space — and paying $20 billion to license Groq's IP to do so — signals how seriously the industry is taking this shift.

For enterprises, it could mean cheaper and faster AI deployments, assuming Nvidia delivers on the promise.

Sources