Google TPU 8t & 8i: AI Training & Inference Chips

Google TPU splits into specialized chips for training and inference

Google announced a fundamental shift in its AI hardware strategy at Cloud Next in Las Vegas, unveiling eighth-generation Tensor Processing Units that separate training and inference workloads for the first time 1

. The company introduced the TPU 8t for model training and TPU 8i for inference, positioning these custom-built AI chips as purpose-built solutions for what it calls the "agentic era of AI" 1

. This dual-track approach mirrors strategies from Amazon Web Services, which recognized early that specialized AI hardware could eliminate bottlenecks specific to each workload 3

. Google claims the TPU 8t delivers up to 2.8x faster AI model training compared to last year's Ironwood TPUs, while the TPU 8i provides 80% better performance per dollar for large language model inference 2

Source: Wccftech

Massive scale defines TPU 8t training capabilities

The TPU 8t for model training represents Google's commitment to reducing frontier model development from months to weeks 1

. Updated server clusters, called "pods," now house 9,600 chips with two petabytes of shared high-bandwidth memory, delivering 121 FP4 EFlops of compute per pod—nearly three times higher than Ironwood's training compute ceiling 1

. Each AI accelerator features 216 GB of high-bandwidth memory with 6.5 TB/s of bandwidth, 128 MB of on-chip SRAM, and up to 12.6 petaFLOPS of 4-bit floating point compute 3

. Google claims TPU 8t can scale linearly to support up to one million chips in a single logical cluster, using optical-circuit switches to connect up to 9,600 AI accelerators in a unified pod 3

. Multiple pods connect via the new Virgo Network in a flat two-tier topology, supporting up to 134,000 TPUs per data center 3

Source: Google

Efficiency gains target goodpute and power consumption

Google emphasizes a "goodpute" rate of 97 percent for TPU 8t, meaning the chips spend more time actively advancing AI model training rather than waiting or handling faults 1

. Mark Lohmeyer, Google's vice president of compute and AI infrastructure, explained that at frontier training scale, every percentage point can translate into days of active training time 3

. The eighth-gen chips offer twice the performance per watt compared to Ironwood, with TPU 8t delivering 124% more performance per watt and TPU 8i providing a 117% gain 4

. Google Cloud has also developed a Managed Lustre storage system capable of delivering 10 TB/s of aggregate data directly into accelerator memory 3

. Data centers co-designed with TPUs feature integrated networking on a single chip and more efficient pod layouts, reportedly increasing computing power per unit of electricity by six times 1

TPU 8i optimizes inference for agent workloads

The TPU 8i for inference addresses the ongoing usage of models after users submit prompts, a workload fundamentally different from training 2

. Google tripled the on-chip SRAM to 384 MB per chip, matching Nvidia's approach with its upcoming Groq 3 LPU hardware 5

. This larger cache allows TPU 8i to keep more key-value information on-chip, speeding up models with longer context windows 1

. Inference pods now contain 1,152 chips versus just 256 for Ironwood inference clusters, delivering 11.6 EFlops per pod 1

. The architecture is designed "to deliver the massive throughput and low latency needed to concurrently run millions of agents cost-effectively," according to Alphabet CEO Sundar Pichai 5

. Lohmeyer noted that "the number of transactions is going way up, and the cost per transaction needs to go way down for it to scale" .

Axion CPUs replace x86 in full-stack ARM approach

The eighth-gen AI accelerators mark the first from Google to rely solely on its custom Axion ARM CPU host, featuring one CPU for every two TPUs compared to Ironwood's ratio of one x86 CPU servicing four TPU chips 1

. This "full-stack" ARM-based approach allows for greater efficiency, following a similar path to Amazon's integration of Graviton and Trainium 3 earlier this year 3

. Google has also adapted its fourth-gen liquid cooling setup to the new chips, using actively controlled valves to adjust water flow based on workload 1

. Both new TPUs support frameworks developers already use, including JAX, MaxText, PyTorch, SGLang, and vLLM 1

Market positioning against Nvidia and hyperscalers

Google is not replacing Nvidia entirely, continuing to offer services based on Nvidia chips and promising to deploy the upcoming Vera Rubin chip later this year 2

. The company announced collaboration with Nvidia to engineer computer networking that allows Nvidia-based systems to perform more efficiently in its cloud infrastructure, particularly enhancing the software-based networking tech called Falcon 2

. Nvidia's stock price briefly dropped about 1.5 percent after Google's announcement 1

. DA Davidson analysts estimated in September that the Google TPU business, coupled with Google DeepMind, would be worth about $900 billion 5

. Adoption is ramping up, with Citadel Securities building quantitative research software on TPUs, all 17 U.S. Energy Department national laboratories using AI co-scientist software built on the chips, and Anthropic committing to using multiple gigawatts worth of Google TPUs 5

. Both chips will power Google's Gemini-based agents and become generally available later this year 1

Source: Benzinga

Google splits eighth-gen TPU into training and inference chips to challenge Nvidia dominance

Google TPU splits into specialized chips for training and inference

Massive scale defines TPU 8t training capabilities

Efficiency gains target goodpute and power consumption

TPU 8i optimizes inference for agent workloads

Axion CPUs replace x86 in full-stack ARM approach

Market positioning against Nvidia and hyperscalers

References

Google unveils two new TPUs designed for the "agentic era"

Google Cloud launches two new AI chips to compete with Nvidia | TechCrunch

Google dual tracks TPU 8 to conquer training and inference

Google Cloud Releases New TPU Chip Lineup in Bid to Speed Up AI

Google unveils chips for AI training and inference in latest shot at Nvidia

Related Stories

Google Unleashes Ironwood TPU v7: Seventh-Generation AI Chips Challenge Nvidia's Dominance

Google Cloud bets big on AI agents with Gemini Enterprise platform and new chips

Google Unveils Ironwood: A Powerful AI Chip Focused on Inference

Recent Highlights

Meta AI chatbot let hackers hijack Instagram accounts by simply asking to change email addresses

Florida sues OpenAI and Sam Altman over ChatGPT safety, alleging AI harms linked to violence

Apple's Siri overhaul for iOS 27 brings Gemini integration and standalone app to compete with ChatGPT

Recent Highlights

Today's Top Stories

Researchers create AI-powered worm that learns and adapts as it spreads across networks

Google launches fake call detection on Android to combat AI deepfake impersonation scams

ChatGPT hits 1 billion monthly active users faster than TikTok, Instagram, and YouTube

UN Report Reveals AI Environmental Impact Will Double by 2030 as Data Centers Rival Nations