
Nvidia has constructed its AI empire on the foundation of GPUs. However, its substantial $20 billion investment in Groq indicates the company is not certain that GPUs will be the sole dominant technology for the critical phase of AI: operating models at scale, a process known as inference.
The competition to lead in AI inference fundamentally revolves around its economic viability. After a model is trained, all its practical applications—from answering questions and writing code to suggesting products, summarizing text, running chatbots, or interpreting images—occur during inference. This is the point where AI transitions from a capital expense to a service that generates revenue, bringing intense pressure to lower expenses, minimize latency (the wait time for an AI response), and boost efficiency.
This very pressure is why inference has emerged as the next major profit battlefield in the industry. It also explains why Nvidia, in a deal announced right before Christmas, secured a technology license from Groq, a startup developing chips engineered specifically for rapid, low-latency AI inference, and brought on most of its team, including founder and CEO Jonathan Ross.
Inference is AI’s ‘industrial revolution’
Nvidia CEO Jensen Huang has been vocal about the difficulties of inference. Although he states Nvidia is “excellent at every phase of AI,” he remarked during the company’s Q3 earnings call in November that inference is “really, really hard.” Modern inference is far from a simple input-output transaction; it must manage continuous reasoning, serve millions of users simultaneously, ensure consistently low latency, and operate under strict cost limitations. Furthermore, AI agents, which manage multi-step processes, are set to significantly increase both the demand for and complexity of inference—raising the consequences of failure.
“People think that inference is one shot, and therefore it’s easy. Anybody could approach the market that way,” Huang said. “But it turns out to be the hardest of all, because thinking, as it turns out, is quite hard.”
Nvidia’s backing of Groq reinforces this conviction and indicates that even the dominant force in AI training is preparing for different potential outcomes in inference economics.
Huang has also been direct about the future importance of inference to AI’s expansion. In a recent discussion on the BG2 podcast, he noted that inference already makes up over 40% of AI-related revenue and forecasted that it is “about to go up by a billion times.”
“That’s the part that most people haven’t completely internalized,” Huang said. “This is the industry we were talking about. This is the industrial revolution.”
The CEO’s certainty helps clarify why Nvidia is prepared to make aggressive moves regarding how inference is delivered, despite the underlying economic models still being in flux.
Nvidia wants to corner the inference market
According to Karl Freund, founder and principal analyst at Cambrian AI Research, Nvidia is diversifying its strategy to ensure it has a stake in every segment of the market. “It’s a little bit like Meta acquiring ,” he explained. “It’s not that they thought Facebook was bad, they just knew that there was an alternative that they wanted to make sure wasn’t competing with them.”
This is notable, especially since Huang had previously voiced strong confidence in the economics of Nvidia’s own inference platform. “I suspect they found that it either wasn’t resonating as well with clients as they’d hoped, or perhaps they saw something in the chip-memory-based approach that Groq and another company called D-Matrix has,” said Freund, alluding to another fast, low-latency AI chip startup, backed by , that secured $275 million at a $2 billion valuation.
Freund suggested that Nvidia’s move involving Groq could benefit the entire sector. “I’m sure D-Matrix is a pretty happy startup right now, because I suspect their next round will go at a much higher valuation thanks to the [Nvidia-Groq deal],” he said.
Other industry leaders note that the economics of AI inference are evolving as AI expands from chatbots into real-time applications such as robotics, drones, and security systems. These applications cannot tolerate the delays of cloud data transmission or the uncertainty of computational availability. Consequently, they prefer specialized chips like Groq’s over centralized GPU clusters.
Behnam Bastani, founder and CEO of OpenInfer, a company focused on performing AI inference at the “edge”—close to where data is created, like on devices, sensors, or local servers instead of remote cloud data centers—said his startup is aiming for these types of applications.
He stressed that the inference market is still in its early stages. With its Groq deal, Nvidia is aiming to dominate this market. Given that inference economics are still not settled, Bastani said Nvidia is attempting to establish itself as a provider that covers the entire inference hardware spectrum, instead of committing to just one architecture.
“It positions Nvidia as a bigger umbrella,” he said.
