Google splits eighth-gen TPU into training and inference chips to challenge Nvidia dominance

Reviewed byNidhi Govil

24 Sources

Share

Google unveiled its eighth-generation Tensor Processing Units at Cloud Next, marking a strategic shift by splitting capabilities into two specialized chips. The TPU 8t targets AI model training with 2.8x performance gains, while TPU 8i focuses on inference with 80% better performance per dollar. Both chips ditch x86 for custom Axion ARM CPUs, signaling Google's push for full-stack efficiency in the agentic era.

Google TPU splits into specialized chips for training and inference

Google announced a fundamental shift in its AI hardware strategy at Cloud Next in Las Vegas, unveiling eighth-generation Tensor Processing Units that separate training and inference workloads for the first time

1

. The company introduced the TPU 8t for model training and TPU 8i for inference, positioning these custom-built AI chips as purpose-built solutions for what it calls the "agentic era of AI"

1

. This dual-track approach mirrors strategies from Amazon Web Services, which recognized early that specialized AI hardware could eliminate bottlenecks specific to each workload

3

. Google claims the TPU 8t delivers up to 2.8x faster AI model training compared to last year's Ironwood TPUs, while the TPU 8i provides 80% better performance per dollar for large language model inference

2

.

Source: Wccftech

Source: Wccftech

Massive scale defines TPU 8t training capabilities

The TPU 8t for model training represents Google's commitment to reducing frontier model development from months to weeks

1

. Updated server clusters, called "pods," now house 9,600 chips with two petabytes of shared high-bandwidth memory, delivering 121 FP4 EFlops of compute per pod—nearly three times higher than Ironwood's training compute ceiling

1

. Each AI accelerator features 216 GB of high-bandwidth memory with 6.5 TB/s of bandwidth, 128 MB of on-chip SRAM, and up to 12.6 petaFLOPS of 4-bit floating point compute

3

. Google claims TPU 8t can scale linearly to support up to one million chips in a single logical cluster, using optical-circuit switches to connect up to 9,600 AI accelerators in a unified pod

3

. Multiple pods connect via the new Virgo Network in a flat two-tier topology, supporting up to 134,000 TPUs per data center

3

.

Source: Google

Source: Google

Efficiency gains target goodpute and power consumption

Google emphasizes a "goodpute" rate of 97 percent for TPU 8t, meaning the chips spend more time actively advancing AI model training rather than waiting or handling faults

1

. Mark Lohmeyer, Google's vice president of compute and AI infrastructure, explained that at frontier training scale, every percentage point can translate into days of active training time

3

. The eighth-gen chips offer twice the performance per watt compared to Ironwood, with TPU 8t delivering 124% more performance per watt and TPU 8i providing a 117% gain

4

. Google Cloud has also developed a Managed Lustre storage system capable of delivering 10 TB/s of aggregate data directly into accelerator memory

3

. Data centers co-designed with TPUs feature integrated networking on a single chip and more efficient pod layouts, reportedly increasing computing power per unit of electricity by six times

1

.

TPU 8i optimizes inference for agent workloads

The TPU 8i for inference addresses the ongoing usage of models after users submit prompts, a workload fundamentally different from training

2

. Google tripled the on-chip SRAM to 384 MB per chip, matching Nvidia's approach with its upcoming Groq 3 LPU hardware

5

. This larger cache allows TPU 8i to keep more key-value information on-chip, speeding up models with longer context windows

1

. Inference pods now contain 1,152 chips versus just 256 for Ironwood inference clusters, delivering 11.6 EFlops per pod

1

. The architecture is designed "to deliver the massive throughput and low latency needed to concurrently run millions of agents cost-effectively," according to Alphabet CEO Sundar Pichai

5

. Lohmeyer noted that "the number of transactions is going way up, and the cost per transaction needs to go way down for it to scale" .

Axion CPUs replace x86 in full-stack ARM approach

The eighth-gen AI accelerators mark the first from Google to rely solely on its custom Axion ARM CPU host, featuring one CPU for every two TPUs compared to Ironwood's ratio of one x86 CPU servicing four TPU chips

1

. This "full-stack" ARM-based approach allows for greater efficiency, following a similar path to Amazon's integration of Graviton and Trainium 3 earlier this year

3

. Google has also adapted its fourth-gen liquid cooling setup to the new chips, using actively controlled valves to adjust water flow based on workload

1

. Both new TPUs support frameworks developers already use, including JAX, MaxText, PyTorch, SGLang, and vLLM

1

.

Market positioning against Nvidia and hyperscalers

Google is not replacing Nvidia entirely, continuing to offer services based on Nvidia chips and promising to deploy the upcoming Vera Rubin chip later this year

2

. The company announced collaboration with Nvidia to engineer computer networking that allows Nvidia-based systems to perform more efficiently in its cloud infrastructure, particularly enhancing the software-based networking tech called Falcon

2

. Nvidia's stock price briefly dropped about 1.5 percent after Google's announcement

1

. DA Davidson analysts estimated in September that the Google TPU business, coupled with Google DeepMind, would be worth about $900 billion

5

. Adoption is ramping up, with Citadel Securities building quantitative research software on TPUs, all 17 U.S. Energy Department national laboratories using AI co-scientist software built on the chips, and Anthropic committing to using multiple gigawatts worth of Google TPUs

5

. Both chips will power Google's Gemini-based agents and become generally available later this year

1

.

Source: Benzinga

Source: Benzinga

Today's Top Stories

TheOutpost.ai

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

Instagram logo
LinkedIn logo
Youtube logo
© 2026 TheOutpost.AI All rights reserved