5 Sources
5 Sources
[1]
Google deploys new Axion CPUs and seventh-gen Ironwood TPU -- training and inferencing pods beat Nvidia GB300 and shape 'AI Hypercomputer' model
Today, Google Cloud introduced new AI-oriented instances, powered by its own Axion CPUs and Ironwood TPUs. The new instances are aimed at both training and low-latency inference of large-scale AI models, the key feature of these new instances is efficient scaling of AI models, enabled by a very large scale-up world size of Google's Ironwood-based systems. Ironwood is Google's 7 Generation tensor processing unit (TPU), which delivers 4,614 FP8 TFLOPS of performance and is equipped with 192 GB of HBM3E memory, offering a bandwidth of up to 7.37β―TB/s. Ironwood pods scale up to 9,216 AI accelerators, delivering a total of 42.5 FP8 ExaFLOPS for training and inference, which by far exceeds the FP8 capabilities of Nvidia's GB300 NVL72 system that stands at 0.36 ExaFLOPS. The pod is interconnected using a proprietary 9.6 Tb/s Inter-Chip Interconnect network, and carries roughly 1.77 PB of HBM3E memory in total, once again exceeding what Nvidia's competing platform can offer. Ironwood pods -- based on Axion CPUs and Ironwood TPUs -- can be joined into clusters running hundreds of thousands of TPUs, which form part of Google's adequately dubbed AI Hypercomputer. This is an integrated supercomputing platform uniting compute, storage, and networking under one management layer. To boost the reliability of both ultra-large pods and the AI Hypercomputer, Google uses its reconfigurable fabric, named Optical Circuit Switching, which instantly routes around any hardware interruption to sustain continuous operation. IDC data credits the AI Hypercomputer model with an average 353% three-year ROI, 28% lower IT spending, and 55% higher operational efficiency for enterprise customers. Several companies are already adopting Google's Ironwood-based platform. Anthropic plans to use as many as one million TPUs to operate and expand its Claude model family, citing major cost-to-performance gains. Lightricks has also begun deploying Ironwood to train and serve its LTX-2 multimodal system. Although AI accelerators like Google's Ironwood tend to steal all the thunder in the AI era of computing, CPUs are still crucially important for application logic and service hosting as well as running some of AI workloads, such as data ingestion. So, along with its 7 Generation TPUs, Google is also deploying its first Armv9-based general-purpose processors, named Axion. Google has not published the full die specifications for its Axion CPUs: there are no confirmed core count per die (beyond up to 96 vCPUs and up to 768 GB of DDR5 memory for C4A Metal instance), no disclosed clock speeds, and no process node publicly detailed for the part. What we do know is that Axion is built around the Arm Neoverse v2 platform, and is designed to offer up to 50% greater performance and up to 60% higher energy efficiency compared to modern x86 CPUs, as well as 30% higher performance than 'the fastest general-purpose Arm-based instances available in the cloud today. There are reports that the CPU offers 2 MB of private L2 cache per core, 80 MB of L3 cache, supports DDR5-5600 MT/s memory, and Uniform Memory Access (UMA) for nodes. Servers running Google's Axion CPUs and Ironwood CPUs come equipped with the company's custom Titanium-branded controllers, which offload networking, security, and I/O storage processing from the host CPU, thus enabling better management, resulting in higher performance. In general, Axion CPUs can serve both AI servers and general-purpose servers for a variety of tasks. For now, Google offers three Axion configurations: C4A, N4A, and C4A metal. The C4A is the first and primary offering in Google's family of Axion-powered instances, also the only one that is generally available today. It provides up to 72 vCPUs, 576 GB of DDR5 memory, and 100 Gbps networking, paired with Titanium SSD storage of up to 6 TB of local capacity. The instance is optimized for sustained high performance across various applications. Next up is the N4A instance that is also aimed at general workloads such as data processing, web services, and development environments, but it scales up to 64 vCPUs, 512 GB of DDR5 RAM, and 50 Gbps networking, making it a more affordable offering. The other preview model is C4A Metal, which is a bare-metal configuration that presumably exposes the full Axion hardware stack directly to customers: up to 96 vCPUs, 768 GB of DDR5 memory, and 100 Gbps networking. The instance is ment for specialized or license-restricted applications or Arm-native development. These new launches built upon a decade of Google's custom silicon development which began with the original TPU and continued through YouTube's VCUs, Tensor mobile processors, and Titanium infrastructure. The Axion CPU -- Google's first Arm-based general-purpose server processor -- completes the portfolio of the company's custom chips, and the Ironwood TPU set the stage for competition against the best AI accelerators on the market.
[2]
Google's rolling out its most powerful AI chip, taking aim at Nvidia with custom silicon
Sundar Pichai, chief executive officer of Alphabet Inc., during the Bloomberg Tech conference in San Francisco, California, US, on Wednesday, June 4, 2025. Google is making its most powerful chip yet widely available, the search giant's latest effort to try and win business from artificial intelligence companies by offering custom silicon. The company said on Thursday that the seventh generation of its Tensor Processing Unit (TPU), called Ironwood, will hit the market for public use in the coming weeks, after it was initially introduced in April for testing and deployment. The chip, built in-house, is designed to handle everything from the training of large models to powering real-time chatbots and AI agents. In connecting up to 9,216 chips in a single pod, Google says the new Ironwood TPUs eliminate "data bottlenecks for the most demanding models" and give customers "the ability to run and scale the largest, most data-intensive models in existence." Google is in the midst of an ultra high-stakes race, alongside rivals Microsoft, Amazon and Meta, to build out the AI infrastructure of the future. While the majority of large language models and AI workloads have relied on Nvidia's graphics processing units (GPUs), Google's TPUs fall into the category of custom silicon, which can offer advantages on price, performance and efficiency. TPUs have been in the works for a decade. Ironwood, according to Google, is more than four times faster than its predecessor, and major customers are already lining up. AI startup Anthropic plans to use up to 1 million of the new TPUs to run its Claude model, Google said. Alongside the new chip, Google is rolling out a suite of upgrades meant to make its cloud cheaper, faster, and more flexible, as it vies with larger cloud players Amazon Web Services and Microsoft Azure. In its earnings report last week, Google reported third-quarter cloud revenue of $15.15 billion, a 34% increase from the same period a year earlier. Azure revenue jumped 40%, while Amazon reported 20% growth for AWS. Google said it's signed more billion-dollar cloud deals in the first nine months of 2025 than in the previous two years combined. To meet soaring demand, Google upped the high end of its forecast for capital spending this year to $93 billion from $85 billion. "We are seeing substantial demand for our AI infrastructure products, including TPU-based and GPU-based solutions," CEO Sundar Pichai said on the earnings call. "It is one of the key drivers of our growth over the past year, and I think on a going-forward basis, I think we continue to see very strong demand, and we are investing to meet that."
[3]
Google debuts AI chips with 4X performance boost, secures Anthropic megadeal worth billions
Google Cloud is introducing what it calls its most powerful artificial intelligence infrastructure to date, unveiling a seventh-generation Tensor Processing Unit and expanded Arm-based computing options designed to meet surging demand for AI model deployment -- what the company characterizes as a fundamental industry shift from training models to serving them to billions of users. The announcement, made Thursday, centers on Ironwood, Google's latest custom AI accelerator chip, which will become generally available in the coming weeks. In a striking validation of the technology, Anthropic, the AI safety company behind the Claude family of models, disclosed plans to access up to one million of these TPU chips -- a commitment worth tens of billions of dollars and among the largest known AI infrastructure deals to date. The move underscores an intensifying competition among cloud providers to control the infrastructure layer powering artificial intelligence, even as questions mount about whether the industry can sustain its current pace of capital expenditure. Google's approach -- building custom silicon rather than relying solely on Nvidia's dominant GPU chips -- amounts to a long-term bet that vertical integration from chip design through software will deliver superior economics and performance. Why companies are racing to serve AI models, not just train them Google executives framed the announcements around what they call "the age of inference" -- a transition point where companies shift resources from training frontier AI models to deploying them in production applications serving millions or billions of requests daily. "Today's frontier models, including Google's Gemini, Veo, and Imagen and Anthropic's Claude train and serve on Tensor Processing Units," said Amin Vahdat, vice president and general manager of AI and Infrastructure at Google Cloud. "For many organizations, the focus is shifting from training these models to powering useful, responsive interactions with them." This transition has profound implications for infrastructure requirements. Where training workloads can often tolerate batch processing and longer completion times, inference -- the process of actually running a trained model to generate responses -- demands consistently low latency, high throughput, and unwavering reliability. A chatbot that takes 30 seconds to respond, or a coding assistant that frequently times out, becomes unusable regardless of the underlying model's capabilities. Agentic workflows -- where AI systems take autonomous actions rather than simply responding to prompts -- create particularly complex infrastructure challenges, requiring tight coordination between specialized AI accelerators and general-purpose computing. Inside Ironwood's architecture: 9,216 chips working as one supercomputer Ironwood is more than incremental improvement over Google's sixth-generation TPUs. According to technical specifications shared by the company, it delivers more than four times better performance for both training and inference workloads compared to its predecessor -- gains that Google attributes to a system-level co-design approach rather than simply increasing transistor counts. The architecture's most striking feature is its scale. A single Ironwood "pod" -- a tightly integrated unit of TPU chips functioning as one supercomputer -- can connect up to 9,216 individual chips through Google's proprietary Inter-Chip Interconnect network operating at 9.6 terabits per second. To put that bandwidth in perspective, it's roughly equivalent to downloading the entire Library of Congress in under two seconds. This massive interconnect fabric allows the 9,216 chips to share access to 1.77 petabytes of High Bandwidth Memory -- memory fast enough to keep pace with the chips' processing speeds. That's approximately 40,000 high-definition Blu-ray movies' worth of working memory, instantly accessible by thousands of processors simultaneously. "For context, that means Ironwood Pods can deliver 118x more FP8 ExaFLOPS versus the next closest competitor," Google stated in technical documentation. The system employs Optical Circuit Switching technology that acts as a "dynamic, reconfigurable fabric." When individual components fail or require maintenance -- inevitable at this scale -- the OCS technology automatically reroutes data traffic around the interruption within milliseconds, allowing workloads to continue running without user-visible disruption. This reliability focus reflects lessons learned from deploying five previous TPU generations. Google reported that its fleet-wide uptime for liquid-cooled systems has maintained approximately 99.999% availability since 2020 -- equivalent to less than six minutes of downtime per year. Anthropic's billion-dollar bet validates Google's custom silicon strategy Perhaps the most significant external validation of Ironwood's capabilities comes from Anthropic's commitment to access up to one million TPU chips -- a staggering figure in an industry where even clusters of 10,000 to 50,000 accelerators are considered massive. "Anthropic and Google have a longstanding partnership and this latest expansion will help us continue to grow the compute we need to define the frontier of AI," said Krishna Rao, Anthropic's chief financial officer, in the official partnership agreement. "Our customers -- from Fortune 500 companies to AI-native startups -- depend on Claude for their most important work, and this expanded capacity ensures we can meet our exponentially growing demand." According to a separate statement, Anthropic will have access to "well over a gigawatt of capacity coming online in 2026" -- enough electricity to power a small city. The company specifically cited TPUs' "price-performance and efficiency" as key factors in the decision, along with "existing experience in training and serving its models with TPUs." Industry analysts estimate that a commitment to access one million TPU chips, with associated infrastructure, networking, power, and cooling, likely represents a multi-year contract worth tens of billions of dollars -- among the largest known cloud infrastructure commitments in history. James Bradbury, Anthropic's head of compute, elaborated on the inference focus: "Ironwood's improvements in both inference performance and training scalability will help us scale efficiently while maintaining the speed and reliability our customers expect." Google's Axion processors target the computing workloads that make AI possible Alongside Ironwood, Google introduced expanded options for its Axion processor family -- custom Arm-based CPUs designed for general-purpose workloads that support AI applications but don't require specialized accelerators. The N4A instance type, now entering preview, targets what Google describes as "microservices, containerized applications, open-source databases, batch, data analytics, development environments, experimentation, data preparation and web serving jobs that make AI applications possible." The company claims N4A delivers up to 2X better price-performance than comparable current-generation x86-based virtual machines. Google is also previewing C4A metal, its first bare-metal Arm instance, which provides dedicated physical servers for specialized workloads such as Android development, automotive systems, and software with strict licensing requirements. The Axion strategy reflects a growing conviction that the future of computing infrastructure requires both specialized AI accelerators and highly efficient general-purpose processors. While a TPU handles the computationally intensive task of running an AI model, Axion-class processors manage data ingestion, preprocessing, application logic, API serving, and countless other tasks in a modern AI application stack. Early customer results suggest the approach delivers measurable economic benefits. Vimeo reported observing "a 30% improvement in performance for our core transcoding workload compared to comparable x86 VMs" in initial N4A tests. ZoomInfo measured "a 60% improvement in price-performance" for data processing pipelines running on Java services, according to Sergei Koren, the company's chief infrastructure architect. Software tools turn raw silicon performance into developer productivity Hardware performance means little if developers cannot easily harness it. Google emphasized that Ironwood and Axion are integrated into what it calls AI Hypercomputer -- "an integrated supercomputing system that brings together compute, networking, storage, and software to improve system-level performance and efficiency." According to an October 2025 IDC Business Value Snapshot study, AI Hypercomputer customers achieved on average 353% three-year return on investment, 28% lower IT costs, and 55% more efficient IT teams. Google disclosed several software enhancements designed to maximize Ironwood utilization. Google Kubernetes Engine now offers advanced maintenance and topology awareness for TPU clusters, enabling intelligent scheduling and highly resilient deployments. The company's open-source MaxText framework now supports advanced training techniques including Supervised Fine-Tuning and Generative Reinforcement Policy Optimization. Perhaps most significant for production deployments, Google's Inference Gateway intelligently load-balances requests across model servers to optimize critical metrics. According to Google, it can reduce time-to-first-token latency by 96% and serving costs by up to 30% through techniques like prefix-cache-aware routing. The Inference Gateway monitors key metrics including KV cache hits, GPU or TPU utilization, and request queue length, then routes incoming requests to the optimal replica. For conversational AI applications where multiple requests might share context, routing requests with shared prefixes to the same server instance can dramatically reduce redundant computation. The hidden challenge: powering and cooling one-megawatt server racks Behind these announcements lies a massive physical infrastructure challenge that Google addressed at the recent Open Compute Project EMEA Summit. The company disclosed that it's implementing +/-400 volt direct current power delivery capable of supporting up to one megawatt per rack -- a tenfold increase from typical deployments. "The AI era requires even greater power delivery capabilities," explained Madhusudan Iyengar and Amber Huffman, Google principal engineers, in an April 2025 blog post. "ML will require more than 500 kW per IT rack before 2030." Google is collaborating with Meta and Microsoft to standardize electrical and mechanical interfaces for high-voltage DC distribution. The company selected 400 VDC specifically to leverage the supply chain established by electric vehicles, "for greater economies of scale, more efficient manufacturing, and improved quality and scale." On cooling, Google revealed it will contribute its fifth-generation cooling distribution unit design to the Open Compute Project. The company has deployed liquid cooling "at GigaWatt scale across more than 2,000 TPU Pods in the past seven years" with fleet-wide availability of approximately 99.999%. Water can transport approximately 4,000 times more heat per unit volume than air for a given temperature change -- critical as individual AI accelerator chips increasingly dissipate 1,000 watts or more. Custom silicon gambit challenges Nvidia's AI accelerator dominance Google's announcements come as the AI infrastructure market reaches an inflection point. While Nvidia maintains overwhelming dominance in AI accelerators -- holding an estimated 80-95% market share -- cloud providers are increasingly investing in custom silicon to differentiate their offerings and improve unit economics. Amazon Web Services pioneered this approach with Graviton Arm-based CPUs and Inferentia / Trainium AI chips. Microsoft has developed Cobalt processors and is reportedly working on AI accelerators. Google now offers the most comprehensive custom silicon portfolio among major cloud providers. The strategy faces inherent challenges. Custom chip development requires enormous upfront investment -- often billions of dollars. The software ecosystem for specialized accelerators lags behind Nvidia's CUDA platform, which benefits from 15+ years of developer tools. And rapid AI model architecture evolution creates risk that custom silicon optimized for today's models becomes less relevant as new techniques emerge. Yet Google argues its approach delivers unique advantages. "This is how we built the first TPU ten years ago, which in turn unlocked the invention of the Transformer eight years ago -- the very architecture that powers most of modern AI," the company noted, referring to the seminal "Attention Is All You Need" paper from Google researchers in 2017. The argument is that tight integration -- "model research, software, and hardware development under one roof" -- enables optimizations impossible with off-the-shelf components. Beyond Anthropic, several other customers provided early feedback. Lightricks, which develops creative AI tools, reported that early Ironwood testing "makes us highly enthusiastic" about creating "more nuanced, precise, and higher-fidelity image and video generation for our millions of global customers," said Yoav HaCohen, the company's research director. Google's announcements raise questions that will play out over coming quarters. Can the industry sustain current infrastructure spending, with major AI companies collectively committing hundreds of billions of dollars? Will custom silicon prove economically superior to Nvidia GPUs? How will model architectures evolve? For now, Google appears committed to a strategy that has defined the company for decades: building custom infrastructure to enable applications impossible on commodity hardware, then making that infrastructure available to customers who want similar capabilities without the capital investment. As the AI industry transitions from research labs to production deployments serving billions of users, that infrastructure layer -- the silicon, software, networking, power, and cooling that make it all run -- may prove as important as the models themselves. And if Anthropic's willingness to commit to accessing up to one million chips is any indication, Google's bet on custom silicon designed specifically for the age of inference may be paying off just as demand reaches its inflection point.
[4]
Google unleashes Ironwood TPUs, new Axion instances as AI inference demand surges - SiliconANGLE
Google unleashes Ironwood TPUs, new Axion instances as AI inference demand surges Google LLC today announced it's bringing its custom Ironwood chips online for cloud customers, unleashing tensor processing units that can scale up to 9,216 chips in a single pod to become the company's most powerful AI accelerator architecture to date. The new chips will be available to customers in the coming weeks, alongside new Arm-based Axion instances that promise up to twice the price-performance of current x86-based alternatives. Google's own frontier models, including Gemini, Veo and Imagen, are trained and deployed using TPUs, alongside equally sizable third-party models such as Anthropic PBC's Claude. The company said the advent of AI agents, which require deep reasoning and advanced task management, is defining a new era where inference -- the runtime intelligence of active models -- has greatly increased the demand for AI compute. The tech giant debuted Ironwood at Google Cloud Next 2025 in April and touted it as the most powerful TPU accelerator the company has ever built. The next-generation architecture allows the company to scale up to 9,216 chips in a single server pod, linked together with inter-chip interconnect to provide up to 9.6 terabits per second of bandwidth. They can be connected to a colossal 1.77 petabytes of shared high-bandwidth or HBM memory. Inter-chip interconnect, or ICI, acts as a "data highway" for chips, allowing them to think and act as a single AI accelerator brain. This is important because modern-day AI models require significant processing power, but they can't fit on single chips and must be split up across hundreds or thousands of processors for parallel processing. Just like thousands of buildings crammed together in a city, the biggest problem this kind of system faces is traffic congestion. With more bandwidth, they can talk faster and with less delay. HBM maintains the vast amount of real-time data AI models need to "remember" when training or processing queries from users. According to Google, the 1.77 petabytes of accessible data in a single, unified system is industry-leading. A single petabyte, or 1,000 terabytes, can represent around 40,000 high-definition Blu-ray movies or the text of millions of books. Making all of this accessible at once lets AI models respond instantly and intelligently with enormous amounts of knowledge. The company said the new Ironwood-based pod architecture can deliver more than 118x more FP8 ExaFLOPS than the nearest competitor and 4x better performance for training and inference than Trillium, the previous generation of TPU. Google included a new software layer on top of this advanced hardware co-designed to maximize Ironwood's capabilities and memory. This includes a new Cluster Director capability in Google Kubernetes Engine, which enables advanced maintenance and topology awareness for better process scheduling. For pretraining and post-training, the company announced enhancements to MaxText, a high-performance, open source large language model training framework for implementing reinforced learning techniques. Google also recently announced upgrades to vLLM to support inference switching between GPUs and TPUs, or a hybrid approach. Anthropic, an early user of Ironwood, said that the chips provided impressive price-performance gains, allowing them to serve massive Claude models at scale. The leading AI model developer and provider announced late last month that it plans to access up to 1 million TPUs. "Our customers, from Fortune 500 companies to startups, depend on Claude for their most critical work," Anthropic's Head of Compute James Bradbury said. "As demand continues to grow exponentially, we're increasing our compute resources as we push the boundaries of AI research and product development." Google also announced the expansion of its Axion offerings with two new services in preview: N4A, its second-generation Axion virtual machines, and C4A metal, the company's first Arm Ltd.-based bare-metal instances. Axion is the company's custom Arm-based central processing unit, designed to provide energy-efficient performance for general-purpose workloads. Google executives noted that the key to Axion's design philosophy is its compatibility with the company's workload-optimized infrastructure strategy. It uses Arm's expertise in efficient CPU design to deliver significant performance and power use enhancements over traditional x86 processors. "The Axion processors will have 30% higher performance than the fastest Arm processors available in the cloud today," Mark Lohmeyer, vice president and general manager of AI and computing infrastructure at Google Cloud, said in an exclusive broadcast on theCUBE, SiliconANGLE Media's livestreaming studio, during Google Cloud Next 2024. "They'll have 50% higher performance than comparable x86 generation processors and 60% better energy efficiency than comparable x86-based instances." Axion provides greatly increased efficiency for modern general-purpose AI workflows and it can be coupled with the new specialized Ironwood accelerators to handle complex model serving. The new Axion instances are designed to provide operational backbone, such as high-volume data preparation, ingestion, analytics and running the virtual services that host intelligent applications. N4A instances support up to 64 virtual CPUs and 512 gigabytes of DDR5 memory, with support for custom machine types. The new C4A metal delivers dedicated physical servers with up to 96 vCPUs and 768 gigabytes of memory. These two new services join the company's previously announced C4A instances designed for consistent high performance.
[5]
Google to offer Ironwood TPU for public use; Anthropic among first major clients
Ironwood, the seventh-generation tensor processing unit (TPU) launched by the search giant in April this year for testing and deployment, is built by linking up to 9,216 chips in one pod. It removes data bottlenecks to let customers run and scale the largest, most data-intensive models. Tech giant Google will release its specialised chip, called Ironwood, designed to run artificial intelligence (AI) models for public use in the coming weeks. Per a report by CNBC on Thursday, Google will also announce upgrades to Cloud, making it "cheaper, faster, and more flexible." The report added that AI startup Anthropic is also planning to use up to one million of the new TPUs to run its Claude model. Ironwood is the seventh-generation Tensor Processing Unit (TPU) launched by the search giant in April this year for testing and deployment. Google's in-house Ironwood TPU is built to support both training and real-time AI workloads, including chatbots and AI agents. By linking up to 9,216 chips in one pod, it removes data bottlenecks and lets customers run and scale the largest, most data-intensive models. Google competes with the likes of Microsoft, Amazon, and Meta to build next-generation AI infrastructure. While most major AI models still run on Nvidia GPUs, Google's custom TPU chips offer potential advantages in cost, performance, and efficiency, the company had said earlier in a blog post. Technical insights * It is an enormous cluster of up to 9,216 liquid-cooled chips working together as one single unit.* These chips are linked with inter-chip interconnect (ICI) networking that consumes 10 megawatts of power.* For customers, the chip is available in two scalable configurations: 256 chips or a full 9,216-chip cluster. Key features of Ironwood * Ironwood is built to handle the heavy computation and communication needs of advanced "thinking models" like large language models (LLMs), mixture of experts (MoE) and advanced reasoning systems.* It is capable of allowing AI workloads to run more cost-effectively. Google claims that Ironwood is nearly 30x more power efficient than its first Cloud TPU from 2018.* It offers 192 GB per chip, making it easier to process larger models and data sets, and is six times greater than Google's sixth-generation TPU called Trillium, announced last year. Google parent Alphabet reported its first-ever $100 billion quarterly revenue on October 30, led by strong growth across its core search business and rapidly expanding cloud division that was buoyed by AI. The company's ambitious approach to offering AI "is delivering strong momentum, and we're shipping at speed," CEO Sundar Pichai said.
Share
Share
Copy Link
Google Cloud introduces its most powerful AI infrastructure yet with seventh-generation Ironwood TPUs and custom Axion CPUs, securing major partnerships including Anthropic's billion-dollar commitment to use up to one million TPUs for Claude models.

Source: Economic Times
Google Cloud has unveiled its most ambitious AI infrastructure initiative to date, introducing the seventh-generation Ironwood Tensor Processing Units (TPUs) and custom Axion CPUs designed to challenge Nvidia's dominance in the AI accelerator market
1
. The announcement represents a decade-long investment in custom silicon development, positioning Google as a formidable competitor in the ultra-high-stakes race to build AI infrastructure of the future2
.
Source: VentureBeat
The Ironwood TPU delivers 4,614 FP8 TFLOPS of performance per chip and comes equipped with 192 GB of HBM3E memory, offering bandwidth of up to 7.37 TB/s
1
. The architecture's most striking feature is its ability to scale up to 9,216 chips in a single pod, connected through Google's proprietary Inter-Chip Interconnect network operating at 9.6 terabits per second3
.This massive interconnect fabric provides access to 1.77 petabytes of shared high-bandwidth memory across the entire pod, delivering a total of 42.5 FP8 ExaFLOPS for training and inference workloads
1
. According to Google's specifications, this performance significantly exceeds Nvidia's GB300 NVL72 system, which delivers 0.36 ExaFLOPS4
.In a striking validation of Google's technology, Anthropic has committed to accessing up to one million Ironwood TPUs to operate and expand its Claude model family
2
. This commitment represents one of the largest known AI infrastructure deals to date, worth tens of billions of dollars and demonstrating major cost-to-performance gains compared to alternative solutions3
."Our customers, from Fortune 500 companies to startups, depend on Claude for their most critical work," said James Bradbury, Anthropic's Head of Compute. "As demand continues to grow exponentially, we're increasing our compute resources as we push the boundaries of AI research and product development"
4
.Related Stories
Alongside the Ironwood TPUs, Google introduced its first Armv9-based general-purpose processors, named Axion, built around the Arm Neoverse v2 platform
1
. The Axion CPUs are designed to offer up to 50% greater performance and up to 60% higher energy efficiency compared to modern x86 CPUs, while providing 30% higher performance than the fastest general-purpose Arm-based instances currently available in the cloud4
.Google offers three Axion configurations: the C4A instance with up to 72 vCPUs and 576 GB of DDR5 memory, the N4A instance scaling to 64 vCPUs and 512 GB of RAM for more affordable workloads, and the C4A Metal bare-metal configuration exposing up to 96 vCPUs and 768 GB of memory for specialized applications
1
.Summarized by
Navi
[1]
[3]
[4]
1
Business and Economy

2
Business and Economy

3
Business and Economy
