5 Sources
5 Sources
[1]
Google Eyes New Chips to Speed Up AI Results, Challenging Nvidia
In a matter of months, Google's AI chips have become one of the hottest commodities in the tech sector. Leading artificial intelligence developers, including some of the firm's biggest rivals, are stocking up on them. Now, the Alphabet Inc.-owned company aims to build on its momentum with the likely introduction of new chips dedicated to inference, or running AI models after they've been trained. With this push, Google is poised to further challenge market leader Nvidia Corp. in a fast-growing category for semiconductors that's fueled by surging adoption of AI software. As demand grows for quickly processing AI queries, "it now becomes sensible to specialize chips more for training or more for inference workloads," Google Chief Scientist Jeff Dean said in an interview. "We are looking at a whole bunch of different things," he added, including the speed of AI results it wants to enable. The company plans to announce its new generation of custom-designed chips, known as tensor processing units, or TPUs, at the Google Cloud Next conference in Las Vegas this week. Amin Vahdat, who oversees Google's AI infrastructure and chip work, declined to comment on plans for an inference chip that can speed up AI outputs, but said more will likely be shared "in the relatively near future." Nvidia's graphics processing units, or GPUs, remain the gold standard for AI, particularly for training more advanced models. But a growing number of up-and-comers are vying to take on the chipmaker for inference uses, including by offering chips meant to cut down response times for chatbots and AI agents. Last month, Nvidia began selling a chip intended for faster inference based on technology it acquired from Groq as part of a reported $20 billion licensing deal. Google brings unique strengths to that competitive landscape, including a decade of experience designing chips, vast resources from its online search profits and firsthand insights on AI models. Among the top AI developers, only Google makes its own chips at significant scale, allowing it to share vital feedback between teams to better customize hardware. (OpenAI is only now starting to design its own.) In a recent podcast interview, Nvidia's Jensen Huang stressed the advantages of his company's chips, saying they can do "a whole bunch of applications" that "you can't do with TPUs." Google, for its part, relies on a mix of TPUs and GPUs for its own work. "A lot of people would like to run on both," Demis Hassabis, chief executive officer of Google DeepMind, told Bloomberg. Interest in TPUs is particularly high from leading AI labs, he said. Google has previously touted inference capabilities for its chips. It also considered releasing separate chips for training and inference early on, according to Partha Ranganathan, a vice president and engineering fellow at Google, but so far it's resisted that approach. That might change soon as the AI spending boom moves from training to inference. "The battleground is shifting towards inference," said Chirag Dekate, an analyst at Gartner, who notes that in his experience Google's Gemini model is the fastest at responding to complex reasoning tasks. "In that battleground, Google has an infrastructure advantage." Already, today's TPUs are a strong choice for processing results for the emerging crop of AI agents that field more complex work on a user's behalf, according to Natalie Serrino, co-founder at Gimlet Labs, a startup that makes software for routing AI tasks to the best chip for each job. "They are very good tools for the workload that is exploding," she said. An overnight success that took a decade Google's long-simmering chip efforts gained new attention in October when Anthropic PBC -- one of the most closely watched AI developers -- unveiled an expanded agreement to access as many as 1 million TPUs. The next month, Google debuted the more advanced Gemini 3 model, trained and run on TPUs, to rave reviews. Since then, demand for Google's chips has only grown among large firms. Meta Platforms Inc. signed a multibillion-dollar deal to use TPUs through Google Cloud over several years. The company just received access to its first significant supply and is testing them out to see what tasks they're best suited for, said Santosh Janardhan, Meta's head of infrastructure. "It does look like there might be inference advantages," he said, while noting that "no new platform is without hurdles and a learning curve." Anthropic also signed a deal with Broadcom Inc., Google's TPU partner, for chips that will enable it to tap into about 3.5 gigawatts of computing power starting in 2027. Citadel Securities plans to present at the Google conference about how TPUs let the company train models faster than previous work with GPUs. And G42, the Abu Dhabi technology conglomerate, has held "multiple discussions" with Google about using its TPUs, according to Talal Al Kaissi, the interim CEO of Core42, the firm's cloud unit. "I'm very bullish," Al Kaissi said about the talks. Google is already taking new steps to meet customers where they are. The company is testing out letting companies like Anthropic run some of their TPUs in their own data centers rather than Google's facilities, according to a person familiar with the matter. It has also enabled TPU customers to use outside tools like PyTorch as well as other scheduling software rather than solely relying on Google's products, Vahdat said. Those changes are helping shift perception for chips that were born out of Google's computing bottlenecks and long thought of as primarily useful for the company to meet its own needs. After Dean, Google's chief scientist, started building an earlier AI software system to let people use language translation and voice recognition services, he realized there was no way that even Google could afford to deliver it using available chips and hardware. At the same time, the central processing units Google relied on for AI were improving at a slower rate. The company decided it should build an accelerator that focused on a narrower set of tasks that might rack up the biggest bills for AI. The key idea behind the TPU is that it "solves a small number of problems but the amount of computation required for them was enormous," said Vahdat, a former computer science professor who played an early, key role in pushing Google to adopt the optical switches that help connect TPUs into supercomputers. "The conventional wisdom at the time was you don't build specialized hardware." Over the years, Google's TPUs have evolved alongside its AI work. A seminal 2017 Google research paper that gave rise to today's large language models also pushed the TPU team to focus on chips for training bigger AI systems. Later, Google DeepMind and the chips team noticed that TPUs were sitting unused too often when deployed for reinforcement learning, a popular method for improving AI systems at specific tasks. The TPU team adjusted how they network various semiconductors to get the data flowing faster and avoid chips sitting idle. That dynamic continues today as Google debates how many chips to link together in a single pod or whether the hardware can be less precise in order to save money. "A lot of those things are informed by the model experiments," Hassabis said. In the future, he would love the TPU team to consider making an accelerator for edge-of-network cases, where the chip is placed closer to users, rather than being accessed via the cloud to reduce latency. Along the way, Google has also built systems to more rapidly spot manufacturing flaws that can have an outsize impact on software. When working with AI accelerator chips that manage massive amounts of math, even a subtle failure can metastasize and cause a model to "completely self-destruct," said Paul Barham, the Google distinguished scientist who co-leads the Gemini infrastructure team. An issue like that happened at Google about two years ago and took weeks to sort out what happened, he said, describing these as "bugs from hell." "We now have to do that with hundreds of thousands of accelerator chips within 10 seconds," he said. The guessing game For all its expertise in AI development, Google faces a similar challenge to other chipmakers: Chips usually take about three years to develop from start to finish, but AI models are evolving much faster. That makes it difficult to predict what customers will want several years out. "If anybody claims they know what Gemini 10 is going to look like, I'm like, 'Please give me whatever you're smoking,'" Ranganathan said. Barham also worries that the tight feedback loop between the AI model creators and the hardware designers can run the risk of missing new ideas. There's "this cycle that traps you into what works well on the current software and hardware," he said. To strike a middle ground, the TPU team sometimes aims for the chip to be good enough for various uses, even if it's not perfect for each. The other option, Vahdat said, is to plan two different designs. Both may not ship, but they could if the use case for each is compelling enough. As Google's chips become more popular, the company risks supply constraints, not unlike Nvidia. One startup executive, who spoke on condition of anonymity to discuss internal matters, said their company's use of TPUs has been limited by availability and complained that Google had effectively given all its chips to Anthropic. "Mostly we're sort of favoring what supply we do have to the more elite teams who obviously are the ones that could maybe take the most advantage out of what the TPUs do best," Hassabis said, referring to top AI firms. Going forward, Google will also need to decide how to allocate TPUs between its own growing slate of competitive AI services and its burgeoning roster of customers. "There are benefits to making TPUs only for Google, but there are substantial downsides," Vahdat said. "Eventually you wind up on what we refer to as a tech island. It might be a beautiful island, but it's going to be limited in population and it's going to be limited in diversity. In the end, it's probably going to be less good."
[2]
Google assembles four-partner chip supply chain with Broadcom, MediaTek, Marvell to challenge Nvidia in inference
Summary: Google is building the AI industry's most diversified custom chip supply chain, with four design partners (Broadcom, MediaTek, Marvell, Intel) and a roadmap stretching from the Ironwood TPU now shipping in the millions to TPU v8 chips at TSMC 2nm in late 2027. The strategy, detailed ahead of Google Cloud Next, splits the next generation explicitly: Broadcom's "Sunfish" for training, MediaTek's "Zebrafish" for inference at 20-30% lower cost, with Marvell in talks to add a memory processing unit and an additional inference TPU, positioning Google's custom silicon as the most direct challenge to Nvidia's dominance in AI inference. Google is assembling the most diversified custom chip supply chain in the AI industry, with four design partners, a fabrication relationship with TSMC, and a product roadmap that now stretches from the inference chips it is shipping today to the 2-nanometre processors it expects to deploy in late 2027. The strategy, detailed in a Bloomberg feature ahead of Google Cloud Next this week, positions Google's silicon programme as the most direct challenge to Nvidia's dominance in AI inference, the phase of computing where models serve users rather than learn from data. The centrepiece is Ironwood, Google's seventh-generation TPU and the first designed specifically for inference. It delivers ten times the peak performance of the TPU v5p, offers 192 gigabytes of HBM3E memory per chip with 7.2 terabytes per second of bandwidth, and scales to 9,216 liquid-cooled chips in a single superpod producing 42.5 FP8 exaflops. Ironwood is now generally available to Google Cloud customers. Google plans to produce millions of units this year, and Anthropic has committed to up to one million TPUs. Meta also has a rental arrangement. Google's chip programme now involves four distinct design partners, each handling different segments of the product line. Broadcom, which signed a long-term agreement on 6 April to supply TPUs and networking components through 2031, handles the high-performance chip variants. It is also designing the next-generation TPU v8 training chip, codenamed "Sunfish," targeted at TSMC's 2-nanometre process node for late 2027. Broadcom commands more than 70% of the custom AI accelerator market and is projecting $100 billion in AI chip revenue by 2027. MediaTek is designing the cost-optimised inference variant of the TPU v8, codenamed "Zebrafish," also targeting TSMC 2nm in late 2027. MediaTek's involvement began with the I/O modules and peripheral components on Ironwood, where its designs run 20 to 30% cheaper than alternatives. The TPU v8 strategy splits the product line explicitly: Broadcom builds the training chip, MediaTek builds the inference chip, and Google gains the negotiating leverage that comes from having each partner know the other exists. Marvell Technology, which is in talks with Google to develop a memory processing unit and a new inference-focused TPU, would become the third design partner if those negotiations produce a contract. Google plans to produce nearly two million of the memory processing units, with design finalisation expected by next year. Marvell's custom silicon business runs at a $1.5 billion annual rate across 18 cloud-provider design wins, and Nvidia invested $2 billion in the company in March. Intel entered the picture on 9 April with a multi-year deal to supply Xeon processors and custom infrastructure processing units for Google's AI data centre infrastructure. The arrangement covers the networking and general-purpose compute layers that surround the TPUs rather than the AI accelerators themselves. TSMC fabricates all of Google's custom silicon. The relationship is structural: every chip Google designs, regardless of which partner designed it, runs through TSMC's fabs. The shift from training to inference as the dominant AI compute cost is the strategic premise behind Google's entire chip programme. Training a frontier model is a singular, intensive event. Inference is continuous and scales with every user, every query, and every product that incorporates AI. Google serves billions of AI-augmented search queries, Gemini conversations, and Cloud AI API calls daily. At that scale, the cost per inference determines the economics of the entire AI business. Nvidia's GPUs remain dominant for training workloads, where their programmability and the CUDA software ecosystem create switching costs that custom chips cannot easily replicate. But inference workloads are more predictable, more repetitive, and more amenable to the kind of fixed-function optimisation that custom silicon excels at. A purpose-built inference chip that costs less per query than an Nvidia GPU, even if it cannot match the GPU's versatility, wins on the metric that matters at Google's scale. This is why Google is investing in multiple inference chip paths simultaneously. Ironwood serves today's workloads. MediaTek's Zebrafish targets the next generation at lower cost. Marvell's proposed chips would add yet another option. The redundancy is deliberate: Google is building optionality into a supply chain where dependence on any single partner creates pricing risk, capacity risk, and the strategic vulnerability of having its AI infrastructure controlled by someone else's roadmap. Google's total expected TPU shipments are projected at 4.3 million units in 2026, scaling to more than 35 million by 2028. Anthropic's commitment alone represents up to one million of those chips, with access to approximately 3.5 gigawatts of next-generation TPU-based compute starting in 2027. Broadcom's Mizuho-estimated AI revenue from its Google and Anthropic relationships is $21 billion in 2026, rising to $42 billion in 2027. The custom ASIC market more broadly is growing faster than GPUs. TrendForce projects custom chip sales will increase 45% in 2026, compared with 16% growth in GPU shipments. The market is expected to reach $118 billion by 2033. Google is not the only hyperscaler building custom inference silicon: Amazon has Trainium and Inferentia, Microsoft has Maia, and Anthropic is exploring its own chip programme. But Google's multi-partner, multi-generation approach is the most architecturally ambitious. Google Cloud Next opens on Wednesday in Las Vegas with keynotes from Sundar Pichai and Thomas Kurian. The conference is expected to showcase the next-generation TPU architecture and the custom silicon roadmap that connects Ironwood to the v8 generation. The timing of the Bloomberg feature, one day after The Information broke the Marvell talks and two days before Cloud Next, suggests Google is using the conference to frame its chip programme as a coherent strategy rather than a series of individual partnerships. The challenge Nvidia faces is not that any single Google chip will outperform its GPUs. It is that Google is building a system in which multiple custom chips, each optimised for a specific workload and cost point, collectively reduce the share of Google's AI compute that runs on Nvidia hardware. Nvidia's response has been to embed itself in the custom chip ecosystem rather than fight it: the $2 billion Marvell investment and the NVLink Fusion programme ensure Nvidia retains a position in racks where its GPUs are supplemented or replaced by ASICs. For Google, the bet is that controlling its own silicon, across multiple partners and multiple generations, will produce a cost advantage in inference that compounds over time. The scale of Nvidia's business means the incumbent will not be displaced quickly. But the economics of inference favour custom silicon over general-purpose GPUs, and no company has more inference volume than Google. The four-partner supply chain, the dual-track v8 roadmap, and the millions of Ironwood chips shipping this year are the infrastructure for a competitive position that Google expects to strengthen with every query it serves.
[3]
Google developing inference AI chips to rival Nvidia
Google $GOOGL is developing new chips dedicated to AI inference in partnership with Marvell Technology, positioning Alphabet to more directly compete with Nvidia $NVDA in a semiconductor category driven by surging demand for AI software, according to Bloomberg. After a model is trained, inference is the stage where it actually does its job -- fielding queries and producing outputs. Google plans to announce a new generation of its tensor processing units, known as TPUs, at the Google Cloud Next conference in Las Vegas this week, with inference-focused chips expected to follow. "The battleground is shifting towards inference," Gartner analyst Chirag Dekate told Bloomberg. Google Chief Scientist Jeff Dean said in an interview that as AI demand grows, "it now becomes sensible to specialize chips more for training or more for inference workloads." Amin Vahdat, who oversees Google's AI infrastructure and chip work, declined to comment on specific inference chip plans but said more details would likely be shared "in the relatively near future." According to Partha Ranganathan, a vice president and engineering fellow at the company, Google weighed the idea of distinct training and inference chips in its early days before ultimately deciding against it. That approach may be changing as the broader AI spending cycle shifts from training toward inference workloads. Entering the inference market, Google can draw on advantages built over years of in-house chip development, substantial revenue from its search business, and an unusually close relationship with the AI models its hardware is meant to run. No other leading AI developer manufactures its own chips at comparable volume, a structural edge that tightens the loop between the people building Google's models and those designing the silicon they run on. Demand for Google's TPUs has grown substantially. Meta $META struck a multibillion-dollar agreement to procure TPUs via Google Cloud, and Santosh Janardhan, who leads Meta's infrastructure operations, said that initial results point to possible performance gains on inference tasks. Anthropic, which expanded its TPU access to as many as 1 million chips, also signed a separate deal with Broadcom $AVGO -- Google's TPU manufacturing partner -- for chips enabling roughly 3.5 gigawatts of computing power starting in 2027. A person familiar with the matter told Bloomberg that Google has been piloting an arrangement under which enterprise customers, Anthropic among them, could deploy TPU hardware on-premises instead of relying solely on Google's cloud infrastructure. The company has also opened TPU access to outside tools such as PyTorch, moving away from a purely proprietary software environment. Nvidia is still the leader in AI chips, especially for training. Nvidia CEO Jensen Huang said at the company's GTC conference earlier this year that its chips can handle applications "you can't do with TPUs." Google uses both TPUs and Nvidia GPUs for its own AI projects. Supply constraints may complicate Google's ambitions. An unnamed startup executive described chip scarcity as a real obstacle, telling Bloomberg the company had little access to TPUs. Hassabis, for his part, confirmed that available supply is being steered toward leading AI organizations -- the cohort he described as "the more elite teams."
[4]
Google Splits TPUv8 Strategy Into Two Chips, Handing Broadcom Training and MediaTek Inference Duties
Google is preparing two brand new chips under its TPUv8 belt, one for training & one for inference AI workloads. Reports indicate that Google is working on not one, not two, but three chips that will form the basis of its next-gen TPU and AI ventures. We already discussed two of these chips, the memory processing unit and the next-gen TPU series. Now, more details have emerged regarding what to expect from the next-gen TPU series. The TPUv8 AI chip family will replace the existing TPUv7 "Ironwood" lineup that Google has been offering since 2025. The two new chips that are expected next week include TPUv8t and TPUv8i. The primary area of focus for these two chips is relatively straightforward. The TPUv8i is codenamed "Zebrafish" and designed as a cost-efficient inference accelerator, while the TPUv8t is codenamed "Sunfish" and designed as a high-performance training accelerator. The inference chip "TPUv8i" will be designed by Mediatek, while the training chip "TPUv8t" will be designed by Broadcom. Notice that neither of the two chips is being designed by Marvell, which was reportedly working with Google for its next-gen TPU family. It seems like this collaboration should involve either a custom TPU solution or a next-gen "post TPUv8" series. Both Google TPUv8 series AI chips will be tightly integrated with the company's Axion Arm CPUs. Based on the Neoverse N3 Armv9.2 core architecture, Axion processors have been deployed since 2024 and will be the go-to choice for the next-gen lineup. Google's latest v8 architecture Tensor Processing Unit (TPU) is expected to be unveiled this week, which will not only boost the semiconductor and assembly supply chain markets, but also create a series of upgrade opportunities for peripheral components, including OCS all-optical switches, liquid cooling, power supplies, and optical communications companies. via UDN The upcoming Google TPU family is also expected to be a further boost for the semiconductor markets, leading to a boost across the entire industry. At the same time, this also means that overall supply will be further constrained as massive orders are expected to power Google's worldwide servers & AI ecosystem.
[5]
Google Bets on New Chips to Boost AI Results, Challenging Nvidia
In a matter of months, Google's AI chips have become one of the hottest commodities in the tech sector. Leading artificial intelligence developers, including some of the firm's biggest rivals, are stocking up on them. Now, the Alphabet (GOOG)-owned company aims to build on its momentum with the likely introduction of new chips dedicated to inference, or running AI models after they've been trained. With this push, Google is poised to further challenge market leader Nvidia (NVDA) in a fast-growing category for semiconductors that's fueled by surging adoption of AI software. Bloomberg News AI Infrastructure Reporter Dina Bass joins Bloomberg Businessweek Daily to discuss. She speaks with Carol Massar and Tim Stenovec.
Share
Share
Copy Link
Google is assembling the AI industry's most diversified custom chip supply chain, partnering with Broadcom, MediaTek, Marvell, and Intel to develop specialized inference chips. The move positions Google's Tensor Processing Units as a direct challenge to Nvidia's dominance as the battleground shifts from training to inference workloads, where cost per query determines AI business economics.
Google is developing new AI chips dedicated to inference workloads, marking a strategic shift that positions the company to directly challenge Nvidia in the fastest-growing segment of the AI semiconductor market. After months of surging demand for its Tensor Processing Units (TPUs), Google plans to announce its next-generation custom-designed chips at the Google Cloud Next conference in Las Vegas this week
1
. The company's Chief Scientist Jeff Dean explained that as demand grows for quickly processing AI queries, "it now becomes sensible to specialize chips more for training or more for inference workloads"3
.
Source: Market Screener
Inference represents the stage where AI models actually perform their jobs—fielding queries and producing outputs after training is complete. While Google has previously touted inference capabilities for its chips, the company initially resisted releasing separate chips for training and inference
1
. That approach is changing as the AI spending boom moves from training to inference, with Gartner analyst Chirag Dekate noting that "the battleground is shifting towards inference"2
.Google is assembling the AI industry's most diversified custom chip supply chain, involving four distinct design partners: Broadcom, MediaTek, Marvell, and Intel
2
. The TPUv8 strategy splits the next generation explicitly, with Broadcom's "Sunfish" chip handling training workloads and MediaTek's "Zebrafish" chip targeting cost-efficient inference at 20-30% lower cost4
. Both chips target TSMC's 2-nanometre process node for deployment in late 20272
.
Source: Wccftech
Broadcom, which signed a long-term agreement on April 6 to supply TPUs and networking components through 2031, commands more than 70% of the custom AI accelerator market and projects $100 billion in AI chip revenue by 2027
2
. MediaTek's involvement began with I/O modules and peripheral components on Ironwood, Google's seventh-generation TPU, where its designs run 20 to 30% cheaper than alternatives2
. Marvell Technology is in talks with Google to develop a memory processing unit and a new inference-focused TPU, with plans to produce nearly two million of the memory processing units2
.Google brings unique strengths to the competitive landscape challenging Nvidia, including a decade of experience designing chips, vast resources from its online search profits, and firsthand insights on AI models
1
. Among top AI developers, only Google makes its own chips at significant scale, allowing it to share vital feedback between teams to better customize hardware. Demis Hassabis, CEO of Google DeepMind, told Bloomberg that interest in TPUs is particularly high from leading AI labs1
.
Source: Bloomberg
The current Ironwood TPU delivers ten times the peak performance of the TPU v5p, offers 192 gigabytes of HBM3E memory per chip with 7.2 terabytes per second of bandwidth, and scales to 9,216 liquid-cooled chips in a single superpod producing 42.5 FP8 exaflops
2
. Google plans to produce millions of units this year, with Anthropic committing to up to 1 million TPUs under an expanded agreement unveiled in October2
. Meta Platforms also signed a multibillion-dollar deal to use TPUs through Google Cloud over several years, with Santosh Janardhan, Meta's head of infrastructure, noting that "it does look like there might be inference advantages"1
.Related Stories
The shift from training to inference as the dominant AI compute cost is the strategic premise behind Google's entire chip programme. Training a frontier model is a singular, intensive event, while inference is continuous and scales with every user, every query, and every product that incorporates AI
2
. Google serves billions of AI-augmented search queries, Gemini conversations, and Cloud AI API calls daily. At that scale, the cost per inference determines the economics of the entire AI business.Natalie Serrino, co-founder at Gimlet Labs, a startup that makes software for routing AI tasks to the best chip for each job, said today's TPUs are a strong choice for processing results for the emerging crop of AI agents that field more complex work on a user's behalf: "They are very good tools for the workload that is exploding"
1
. Nvidia's GPUs remain dominant for training workloads, where programmability and the CUDA software ecosystem create switching costs that custom chips cannot easily replicate2
. However, inference workloads are more predictable, more repetitive, and more amenable to the kind of fixed-function optimization that custom silicon excels at.While Nvidia CEO Jensen Huang stressed at the company's GTC conference that its chips can handle applications "you can't do with TPUs," Google uses both TPUs and GPUs for its own AI projects
3
. Supply constraints may complicate Google's ambitions, with an unnamed startup executive describing chip scarcity as a real obstacle3
. However, available supply is being steered toward leading AI organizations, according to Hassabis, who described them as "the more elite teams"3
. The upcoming TPU family is expected to boost semiconductor and assembly supply chain markets while creating upgrade opportunities for peripheral components, including all-optical switches, liquid cooling, power supplies, and optical communications companies4
.Summarized by
Navi
[2]
[4]
[5]
1
Policy and Regulation

2
Policy and Regulation

3
Technology
