With Nvidia Groq 3, the Era of AI Inference Is (Probably) Here
TL;DR
At GTC in San Jose (30,000+ attendees), Nvidia CEO Jensen Huang unveiled the Vera Rubin chip line — Nvidia's first chip designed specifically for AI inference.
Key Points
- The Nvidia Groq 3 LPU (language processing unit) incorporates IP licensed from startup Groq for US $20 billion, a deal struck on Christmas Eve 2024.
- Huang declared the inflection point of inference has arrived: AI must now 'think' and 'do,' both of which require inference rather than training workloads.
- Training and inference demand distinct computational profiles; Nvidia has historically focused on training hardware.
Nauti's Take
Nvidia spends $20 billion licensing Groq's IP, brands the result 'Groq 3,' and calls it an inflection point — which is both a smart competitive move and an implicit admission that its training-focused architecture needed help for the inference era. Huang's soundbites about AI needing to 'think' and 'do' are vintage GTC theater, but the underlying point is real: inference is where the volume and the margin will be for the next decade.
The more interesting story is what this means for Groq as a standalone company and for AMD and Intel, who now face a Nvidia that has shored up its last obvious weakness. The inference wars just got a lot more expensive.
Context
For years, training dominated the AI hardware conversation — whoever builds the biggest clusters wins. But real-world AI deployment runs almost entirely on inference: millions of requests per second where latency and cost matter enormously. Nvidia entering the dedicated inference hardware space — and paying $20 billion to license Groq's IP to do so — signals how seriously the industry is taking this shift.
For enterprises, it could mean cheaper and faster AI deployments, assuming Nvidia delivers on the promise.