4 Sources
4 Sources
[1]
Report: Nvidia is working on a top secret AI inference chip that could debut next month - SiliconANGLE
Report: Nvidia is working on a top secret AI inference chip that could debut next month Nvidia Corp. is reportedly working on a dedicated inference processor that will be used by OpenAI Group PBC and other artificial intelligence companies to develop faster and more efficient models, according to a report late Friday in the Wall Street Journal. The new inference platform is expected to be launched at Nvidia's annual GTC developer conference in San Jose next month, and will integrate technology the company acquired from the chip startup Groq Inc. in December. Inference, which refers to the process of running pre-trained AI models in production, has emerged as a key area of focus in the AI industry. Nvidia rivals such as Google LLC and Amazon Web Services Inc. have both developed specialized inference chips that compete with its graphics processing units, and it also faces competition from dedicated inference chip startups such as Cerebras Systems Inc. and SambaNova Systems Inc. 'The Journal said OpenAI has had early access to Nvidia's new inference chip and will become one of its earliest adopters, in what amounts to a significant win for the chipmaker. While OpenAI has been shopping for more efficient alternatives to Nvidia's GPUs in order to diversify its computing stack, it received $30 billion in funding from the world's top chipmaker last week in a deal that reaffirms its commitment to the company. Nvidia is the world's most dominant maker of GPUs, which are specialized processors that can perform billions of tasks simultaneously. But although the company continues to insist that they're useful for both training and inference, its GPUs are no longer considered the most efficient option for powering AI applications. Many companies have found that Nvidia's chips consume too much energy, making them extremely costly for applications such as AI agents, which carry out tasks autonomously on behalf of human users and require immense computing power. That's why OpenAI signed a multibillion-dollar contract with Cerebras last month to access its dinner plate-sized inference-focused chips. Cerebras claims that its silicon is much faster than Nvidia's GPUs when it comes to inference tasks. Nvidia's inference chip is reportedly going to integrate technology developed by Groq. Nvidia paid $20 billion to license Groq's technology on a nonexclusive basis in December, and as part of that deal it also hired its founding Chief Executive Officer Jonathan Ross and its President Sunny Madra. It was billed at the time as one of the largest-ever acqui-hires in Silicon Valley's history. Groq's inference chips are known as "language processing units," and they're based on an entirely novel architecture that enables them to perform inference with much lower energy usage. However, Nvidia hasn't said how it plans to use the startup's technology. OpenAI reportedly wants to use Nvidia's new inference chip to power its Codex programming tool, which is a rival to Anthropic PBC's Claude Code. Coding applications have emerged as one of the most powerful and profitable use cases for generative AI, and it's an area where OpenAI is only second best, for Claude Code is widely considered to be the market leader. Nvidia is also pushing its central processing units as another alternative for running inference workloads. Traditionally, most companies pair its GPUs with CPUs, using the two chips in tandem to compensate for the inefficiencies of the other. But Nvidia says some agentic AI workloads can actually run more efficiently on its most advanced Grace CPUs alone. Last month, Meta Platforms Inc. became the first company to commit to making the first sizable CPU-only deployment to support its ad-targeting agents in production.
[2]
Nvidia plans new chip to speed AI processing: WSJ - The Economic Times
Nvidia plans to launch a new processor to help OpenAI and other clients run AI systems faster and more efficiently. The chip is for "inference" computing and will debut at Nvidia's GTC conference. It incorporates a design by startup Groq.Nvidia plans to launch a new processor designed to help OpenAI and other customers build faster, more efficient AI systems, the Wall Street Journal reported on Friday, citing people familiar with the matter. Nvidia is developing a new system for "inference" computing, a form of processing that allows AI models to respond to queries, the report said. The new platform is set to be unveiled at Nvidia's GTC developer conference in San Jose next month and will incorporate a chip designed by startup Groq, the report added citing people familiar. Reuters could not immediately verify the report. Nvidia and OpenAI did not immediately respond to Reuters request for comment. Reuters earlier this month reported OpenAI is unsatisfied with the speed at which Nvidia's hardware can spit out answers to ChatGPT users for specific types of problems such as software development and AI communicating with other software. It needs new hardware that would eventually provide about 10% of OpenAI's inference computing needs in the future, one of the sources told Reuters. The ChatGPT maker has discussed working with startups including Cerebras and Groq to provide chips for faster inference, two sources said. But Nvidia struck a $20-billion licensing deal with Groq that shut down OpenAI's talks, one of the sources told Reuters. In September, Nvidia said it intended to pour as much as $100 billion into OpenAI as part of a deal that gave the chipmaker a stake in the startup and gave OpenAI the cash it needed to buy the advanced chips.
[3]
OpenAI Is Set to Be the Biggest Customer for the Upcoming NVIDIA-Groq AI Chip, Allocating 3GW of Dedicated 'Inference Capacity'
OpenAI's newest partnership with NVIDIA not only focuses on Vera Rubin but also on inference capacity, which will be provided by the upcoming NVIDIA-Groq solution. OpenAI is currently engaged in financing deals with infrastructure partners all across the AI industry, and the AI giant recently announced $110 billion in fresh capital, driven by the likes of NVIDIA, SoftBank, and Amazon. OpenAI calls the investments a necessity to keep the AI bandwagon up and running, and they have been one of the ways the firm has secured the computing power it needs. A WSJ report reveals that NVIDIA will showcase its Groq-focused "processor" at this year's GTC 2026, in line with our previous reporting. More specifically, OpenAI will be the biggest customer of the upcoming solution, which is an interesting decision. In the recent commitment by NVIDIA into OpenAI, it was revealed that the latter will use 3GW of "dedicated inference capacity", likely coming from what NVIDIA will showcase in March. Earlier reports have suggested that inference is a major concern for OpenAI in recent times, and that the company had been 'displeased' with what NVIDIA had been offering to address inference. OpenAI has agreed to become one of the largest customers of the new processor, some of the people said, representing a major win for Nvidia. The ChatGPT maker, which is one of Nvidia's largest customers, has spent the past few months shopping for more efficient alternatives to Nvidia's chips. - WSJ OpenAI was reported to be in talks with Cerebras and Groq as well to enter into agreements focused on providing optimal performance for latency-sensitive workloads. But now, it appears that OpenAI is sticking with NVIDIA, likely indicating that the upcoming solution built around Groq's LPUs is promising enough for the AI giant to commit to 3GW of capacity. Regarding what we expect from the NVIDIA-Groq arrangement, the most likely solution is a hybrid compute tray configuration, as discussed here. For now, we expect major announcements from NVIDIA at this year's GTC, focusing on Vera Rubin, possibly next-gen Feynman, and the solution built around Groq.
[4]
NVIDIA's GTC 2026 Reveal: AI Processor Featuring Groq Technology for OpenAI
This could strengthen NVIDIA's position in advanced AI infrastructure, expand its custom silicon strategy beyond GPUs, and deepen ties with key AI developers, reinforcing its leadership in the rapidly evolving AI hardware ecosystem. The new system is expected to leverage architecture from Groq, the "acqui-hire" startup whose founder joined Nvidia last year. By moving toward Language Processing Units (LPUs), NVIDIA aims to solve the "bottleneck" of AI decoding. The platform would focus on inference computing and include a chip designed by startup Groq. This processor was designed to help and other customers build faster, more efficient AI systems. One source said Nvidia struck a US$20 billion licensing deal with Groq.
Share
Share
Copy Link
Nvidia is set to launch a dedicated AI inference chip at its GTC conference next month, integrating technology from startup Groq acquired in a $20 billion licensing deal. OpenAI will be among the earliest adopters, committing 3GW of dedicated inference capacity to address growing demands for faster, more efficient AI processing.
Nvidia is developing a specialized AI inference chip that will debut at its annual GTC developer conference in San Jose next month, according to reports from the Wall Street Journal
1
2
. The new platform represents a strategic shift for the chipmaker as it addresses mounting pressure in inference computing, where rivals like Google and Amazon Web Services have already deployed specialized chips that compete with Nvidia's traditional GPUs1
. The processor is designed to help OpenAI and other customers accelerate AI processing and run pre-trained AI models more efficiently in production environments.
Source: ET
The upcoming NVIDIA-Groq AI chip integrates technology from startup Groq, which Nvidia acquired through a $20 billion licensing deal in December
1
4
. As part of the arrangement, Nvidia hired Groq's founding CEO Jonathan Ross and President Sunny Madra in what was billed as one of Silicon Valley's largest-ever acqui-hires. Groq's chips, known as Language Processing Units or LPUs, are built on an entirely novel architecture that enables inference tasks with significantly lower energy consumption compared to traditional GPUs1
4
. This licensing deal effectively shut down OpenAI's talks with other inference chip providers including Cerebras and Groq itself2
.
Source: SiliconANGLE
OpenAI has secured early access to Nvidia's new AI inference chip and will become one of its largest customers, committing 3GW of dedicated inference capacity
3
. This represents a major win for Nvidia, particularly as OpenAI had been actively shopping for more efficient alternatives to address latency-sensitive AI workloads. The ChatGPT maker has been dissatisfied with the speed at which Nvidia's current hardware can deliver answers for specific problems such as software development and AI-to-AI communication2
. OpenAI reportedly needs new hardware that would eventually provide about 10% of its inference computing needs in the future2
. The company plans to use the new chip to power its Codex programming tool, which competes with Anthropic's Claude Code in the lucrative AI coding market1
.Related Stories
The shift toward dedicated inference processors addresses a critical bottleneck in AI decoding that has plagued the industry
4
. While Nvidia's GPUs remain dominant for training AI models, they are no longer considered the most efficient option for running AI applications in production. Many companies have found that Nvidia's chips consume excessive energy, making them costly for applications like AI agents that require immense computing power to carry out tasks autonomously1
. This efficiency gap has driven companies like OpenAI to sign multibillion-dollar contracts with competitors such as Cerebras and SambaNova, which offer specialized inference-focused chips1
. The new platform could strengthen Nvidia's position in AI infrastructure by expanding its custom silicon strategy beyond traditional GPUs4
, reinforcing its leadership as demand for efficient computing power continues to surge across the rapidly evolving AI hardware ecosystem.
Source: Analytics Insight
Summarized by
Navi
[1]
[4]
24 Dec 2025•Business and Economy

10 Feb 2025•Technology

16 Mar 2026•Technology
