9 Sources
9 Sources
[1]
Google deploys new Axion CPUs and seventh-gen Ironwood TPU -- training and inferencing pods beat Nvidia GB300 and shape 'AI Hypercomputer' model
Today, Google Cloud introduced new AI-oriented instances, powered by its own Axion CPUs and Ironwood TPUs. The new instances are aimed at both training and low-latency inference of large-scale AI models, the key feature of these new instances is efficient scaling of AI models, enabled by a very large scale-up world size of Google's Ironwood-based systems. Ironwood is Google's 7 Generation tensor processing unit (TPU), which delivers 4,614 FP8 TFLOPS of performance and is equipped with 192 GB of HBM3E memory, offering a bandwidth of up to 7.37 TB/s. Ironwood pods scale up to 9,216 AI accelerators, delivering a total of 42.5 FP8 ExaFLOPS for training and inference, which by far exceeds the FP8 capabilities of Nvidia's GB300 NVL72 system that stands at 0.36 ExaFLOPS. The pod is interconnected using a proprietary 9.6 Tb/s Inter-Chip Interconnect network, and carries roughly 1.77 PB of HBM3E memory in total, once again exceeding what Nvidia's competing platform can offer. Ironwood pods -- based on Axion CPUs and Ironwood TPUs -- can be joined into clusters running hundreds of thousands of TPUs, which form part of Google's adequately dubbed AI Hypercomputer. This is an integrated supercomputing platform uniting compute, storage, and networking under one management layer. To boost the reliability of both ultra-large pods and the AI Hypercomputer, Google uses its reconfigurable fabric, named Optical Circuit Switching, which instantly routes around any hardware interruption to sustain continuous operation. IDC data credits the AI Hypercomputer model with an average 353% three-year ROI, 28% lower IT spending, and 55% higher operational efficiency for enterprise customers. Several companies are already adopting Google's Ironwood-based platform. Anthropic plans to use as many as one million TPUs to operate and expand its Claude model family, citing major cost-to-performance gains. Lightricks has also begun deploying Ironwood to train and serve its LTX-2 multimodal system. Although AI accelerators like Google's Ironwood tend to steal all the thunder in the AI era of computing, CPUs are still crucially important for application logic and service hosting as well as running some of AI workloads, such as data ingestion. So, along with its 7 Generation TPUs, Google is also deploying its first Armv9-based general-purpose processors, named Axion. Google has not published the full die specifications for its Axion CPUs: there are no confirmed core count per die (beyond up to 96 vCPUs and up to 768 GB of DDR5 memory for C4A Metal instance), no disclosed clock speeds, and no process node publicly detailed for the part. What we do know is that Axion is built around the Arm Neoverse v2 platform, and is designed to offer up to 50% greater performance and up to 60% higher energy efficiency compared to modern x86 CPUs, as well as 30% higher performance than 'the fastest general-purpose Arm-based instances available in the cloud today. There are reports that the CPU offers 2 MB of private L2 cache per core, 80 MB of L3 cache, supports DDR5-5600 MT/s memory, and Uniform Memory Access (UMA) for nodes. Servers running Google's Axion CPUs and Ironwood CPUs come equipped with the company's custom Titanium-branded controllers, which offload networking, security, and I/O storage processing from the host CPU, thus enabling better management, resulting in higher performance. In general, Axion CPUs can serve both AI servers and general-purpose servers for a variety of tasks. For now, Google offers three Axion configurations: C4A, N4A, and C4A metal. The C4A is the first and primary offering in Google's family of Axion-powered instances, also the only one that is generally available today. It provides up to 72 vCPUs, 576 GB of DDR5 memory, and 100 Gbps networking, paired with Titanium SSD storage of up to 6 TB of local capacity. The instance is optimized for sustained high performance across various applications. Next up is the N4A instance that is also aimed at general workloads such as data processing, web services, and development environments, but it scales up to 64 vCPUs, 512 GB of DDR5 RAM, and 50 Gbps networking, making it a more affordable offering. The other preview model is C4A Metal, which is a bare-metal configuration that presumably exposes the full Axion hardware stack directly to customers: up to 96 vCPUs, 768 GB of DDR5 memory, and 100 Gbps networking. The instance is ment for specialized or license-restricted applications or Arm-native development. These new launches built upon a decade of Google's custom silicon development which began with the original TPU and continued through YouTube's VCUs, Tensor mobile processors, and Titanium infrastructure. The Axion CPU -- Google's first Arm-based general-purpose server processor -- completes the portfolio of the company's custom chips, and the Ironwood TPU set the stage for competition against the best AI accelerators on the market.
[2]
TPU v7, Google's answer to Nvidia's Blackwell is nearly here
Chocolate Factory's homegrown silicon boasts Blackwell-level perf at massive scale opinion Look out, Jensen! With its TPUs, Google has shown time and time again that it's not the size of your accelerators that matters but how efficiently you can scale them in production. Now with its latest generation of Ironwood accelerators slated for general availability in the coming weeks, the Chocolate Factory not only has scale on its side but a tensor processing unit (TPU) with the grunt to give Nvidia's Blackwell behemoths a run for their money. First announced in April alongside a comically bad comparison to the El Capitan supercomputer -- no, an Ironwood TPU Pod is not 24x faster than the Department of Energy's biggest iron -- Google's TPU v7 accelerators are a major leap in performance over prior generations. Historically, Google's TPUs have paled in comparison to contemporary GPUs from the likes of Nvidia and more recently AMD in terms of raw FLOPS, memory capacity, and bandwidth, making up for this deficit by simply having more of them. Google has offered its TPUs in pods -- large, scale-up compute domains -- containing hundreds or even thousands of chips. If additional compute is needed, users can then scale out to multiple pods. With TPU v7, Google's accelerators offer performance within spitting distance of Nvidia's Blackwell GPUs, when normalizing floating point perf to the same precision. Each Ironwood TPU boasts 4.6 petaFLOPS of dense FP8 performance, slightly higher than Nvidia's B200 at 4.5 petaFLOPS and just shy of the 5 petaFLOPS delivered by the GPU giant's more powerful and power-hungry GB200 and GB300 accelerators. Feeding that compute is 192 GB of HBM3e memory delivering 7.4 TB/s of bandwidth, which again puts it in the same ballpark as Nvidia's B200 at 192GB of HBM and 8TB/s of memory bandwidth. For chip-to-chip communication, each TPU features four ICI Links, which provide 9.6 Tbps of aggregate bidirectional bandwidth, compared to 14.4 Tbps (1.8 TB/s) on the B200 and B300. Put simply, Ironwood is Google's most capable TPU ever, delivering performance 10x that of its TPU v5p, 4x that of its TPU v6e "Trillium" accelerators unveiled last year, and roughly matching that of Nvidia and AMD's last chips. But, as we alluded to earlier, Google's real trick is the ability to scale TPUs into truly enormous compute domains. Nvidia's NVL72 rack systems stitch 72 of its latest Blackwell accelerators into a single compute domain using its proprietary NVLink interconnect tech. AMD will do something similar with its Helios racks and the MI450 series next year. Ironwood, by comparison, is monstrous, with Google offering the chips in pods of 256 at the low end and 9,216 on the high end. If that isn't enough, users with sufficiently deep pockets can then scale out to additional PODs. Back in April, Google told us that its Jupiter datacenter network tech could theoretically support scale compute clusters of up to 43 TPU v7 pods -- or roughly 400,000 accelerators. Having said that, while it may be supported, it's not clear just how big Google's TPU v7 clusters will be in practice. To be clear, compute clusters containing hundreds of thousands of Nvidia GPUs do exist and in fact have become commonplace. The difference is that, up until the Blackwell generation, these clusters have been built using eight-way GPU boxes arranged in massive scale out domains. Nvidia's NVL72 increased the unit of compute by a factor of nine, but still falls far short of Google's TPU PODs. Google's approach to scale up compute fabrics differs considerably from Nvidia's. Where the GPU giant has opted for a large, relatively flat switch topology for its rack-scale platforms, which we've discussed at length here, Google employs a 3D torus topology, where each chip connects to the others in a three dimensional mesh. The topology eliminates the need for high-performance packet switches, which are expensive, power hungry, and, under heavy load, can introduce unwanted latency. While torus can eliminate switch latency, the mesh topology means more hops may be required for any one chip to talk to another. As the torus grows, so does the potential for chip-to-chip latency. By using switches, Nvidia and AMD are able to ensure their GPUs are at most two hops away from the next chip. As we understand it, which is better depends on the workload. Some workloads may benefit from large multi-hop topologies like the 2D and 3D toruses used in Google's TPU pods, while others may perform better on the smaller switched compute domains afforded by Nvidia and AMD's rack designs. Because of this, Google employs a different kind of switching tech, which allows it to slice and dice its TPU pods into various shapes and sizes in order to better suit its own internal and customer workloads. Rather than the packet switches you may be familiar with, Google employs optical circuit switches (OCS). These are more akin to the telephone switchboards of the 20th century. OCS appliances use various methods, MEMS devices being one, to patch one TPU to another. And because this connection is usually made through a physical process connecting one port to another, it introduces little if any latency. As an added benefit, OCS also helps with fault tolerance, as if a TPU fails, the OCS appliances can drop it from the mesh and replace it with a working part. Google has been using 2D and 3D toruses in conjunction with OCS appliances in its TPU pods since at least 2021, when TPU v4 made its debut. Google is also no stranger to operating massive compute fabrics in production. Its TPU v4 supports PODs up to 4096 chips in size, while its TPU v5p more than doubled that to 8,960. So the jump to 9,216 TPU PODs with Ironwood shouldn't be a stretch for Google to pull off. The availability of these massive compute domains has certainly caught the attention of major model builders, including those for whom Google's Gemini models are a direct competitor. Anthropic is among Google's largest customers, having announced plans to utilize up to a million TPUs to train and serve its next generation of Claude models. Anthropic's embrace of Google's TPU tech isn't surprising when you consider that the model dev is also deploying its workloads across hundreds of thousands of Amazon's Trainium 2 accelerators under Project Rainier, which also utilize 2D and 3D torus mesh topologies in their compute fabrics. While Nvidia CEO Jensen Huang may play off the threat of AI ASICs to his GPU empire, it's hard to ignore the fact that chips from the likes of Google, Amazon, and others are catching up quickly in terms of hardware capabilities and network scalability, with software often ending up being the deciding factor.
[3]
Google's rolling out its most powerful AI chip, taking aim at Nvidia with custom silicon
Sundar Pichai, chief executive officer of Alphabet Inc., during the Bloomberg Tech conference in San Francisco, California, US, on Wednesday, June 4, 2025. Google is making its most powerful chip yet widely available, the search giant's latest effort to try and win business from artificial intelligence companies by offering custom silicon. The company said on Thursday that the seventh generation of its Tensor Processing Unit (TPU), called Ironwood, will hit the market for public use in the coming weeks, after it was initially introduced in April for testing and deployment. The chip, built in-house, is designed to handle everything from the training of large models to powering real-time chatbots and AI agents. In connecting up to 9,216 chips in a single pod, Google says the new Ironwood TPUs eliminate "data bottlenecks for the most demanding models" and give customers "the ability to run and scale the largest, most data-intensive models in existence." Google is in the midst of an ultra high-stakes race, alongside rivals Microsoft, Amazon and Meta, to build out the AI infrastructure of the future. While the majority of large language models and AI workloads have relied on Nvidia's graphics processing units (GPUs), Google's TPUs fall into the category of custom silicon, which can offer advantages on price, performance and efficiency. TPUs have been in the works for a decade. Ironwood, according to Google, is more than four times faster than its predecessor, and major customers are already lining up. AI startup Anthropic plans to use up to 1 million of the new TPUs to run its Claude model, Google said. Alongside the new chip, Google is rolling out a suite of upgrades meant to make its cloud cheaper, faster, and more flexible, as it vies with larger cloud players Amazon Web Services and Microsoft Azure. In its earnings report last week, Google reported third-quarter cloud revenue of $15.15 billion, a 34% increase from the same period a year earlier. Azure revenue jumped 40%, while Amazon reported 20% growth for AWS. Google said it's signed more billion-dollar cloud deals in the first nine months of 2025 than in the previous two years combined. To meet soaring demand, Google upped the high end of its forecast for capital spending this year to $93 billion from $85 billion. "We are seeing substantial demand for our AI infrastructure products, including TPU-based and GPU-based solutions," CEO Sundar Pichai said on the earnings call. "It is one of the key drivers of our growth over the past year, and I think on a going-forward basis, I think we continue to see very strong demand, and we are investing to meet that."
[4]
Google debuts AI chips with 4X performance boost, secures Anthropic megadeal worth billions
Google Cloud is introducing what it calls its most powerful artificial intelligence infrastructure to date, unveiling a seventh-generation Tensor Processing Unit and expanded Arm-based computing options designed to meet surging demand for AI model deployment -- what the company characterizes as a fundamental industry shift from training models to serving them to billions of users. The announcement, made Thursday, centers on Ironwood, Google's latest custom AI accelerator chip, which will become generally available in the coming weeks. In a striking validation of the technology, Anthropic, the AI safety company behind the Claude family of models, disclosed plans to access up to one million of these TPU chips -- a commitment worth tens of billions of dollars and among the largest known AI infrastructure deals to date. The move underscores an intensifying competition among cloud providers to control the infrastructure layer powering artificial intelligence, even as questions mount about whether the industry can sustain its current pace of capital expenditure. Google's approach -- building custom silicon rather than relying solely on Nvidia's dominant GPU chips -- amounts to a long-term bet that vertical integration from chip design through software will deliver superior economics and performance. Why companies are racing to serve AI models, not just train them Google executives framed the announcements around what they call "the age of inference" -- a transition point where companies shift resources from training frontier AI models to deploying them in production applications serving millions or billions of requests daily. "Today's frontier models, including Google's Gemini, Veo, and Imagen and Anthropic's Claude train and serve on Tensor Processing Units," said Amin Vahdat, vice president and general manager of AI and Infrastructure at Google Cloud. "For many organizations, the focus is shifting from training these models to powering useful, responsive interactions with them." This transition has profound implications for infrastructure requirements. Where training workloads can often tolerate batch processing and longer completion times, inference -- the process of actually running a trained model to generate responses -- demands consistently low latency, high throughput, and unwavering reliability. A chatbot that takes 30 seconds to respond, or a coding assistant that frequently times out, becomes unusable regardless of the underlying model's capabilities. Agentic workflows -- where AI systems take autonomous actions rather than simply responding to prompts -- create particularly complex infrastructure challenges, requiring tight coordination between specialized AI accelerators and general-purpose computing. Inside Ironwood's architecture: 9,216 chips working as one supercomputer Ironwood is more than incremental improvement over Google's sixth-generation TPUs. According to technical specifications shared by the company, it delivers more than four times better performance for both training and inference workloads compared to its predecessor -- gains that Google attributes to a system-level co-design approach rather than simply increasing transistor counts. The architecture's most striking feature is its scale. A single Ironwood "pod" -- a tightly integrated unit of TPU chips functioning as one supercomputer -- can connect up to 9,216 individual chips through Google's proprietary Inter-Chip Interconnect network operating at 9.6 terabits per second. To put that bandwidth in perspective, it's roughly equivalent to downloading the entire Library of Congress in under two seconds. This massive interconnect fabric allows the 9,216 chips to share access to 1.77 petabytes of High Bandwidth Memory -- memory fast enough to keep pace with the chips' processing speeds. That's approximately 40,000 high-definition Blu-ray movies' worth of working memory, instantly accessible by thousands of processors simultaneously. "For context, that means Ironwood Pods can deliver 118x more FP8 ExaFLOPS versus the next closest competitor," Google stated in technical documentation. The system employs Optical Circuit Switching technology that acts as a "dynamic, reconfigurable fabric." When individual components fail or require maintenance -- inevitable at this scale -- the OCS technology automatically reroutes data traffic around the interruption within milliseconds, allowing workloads to continue running without user-visible disruption. This reliability focus reflects lessons learned from deploying five previous TPU generations. Google reported that its fleet-wide uptime for liquid-cooled systems has maintained approximately 99.999% availability since 2020 -- equivalent to less than six minutes of downtime per year. Anthropic's billion-dollar bet validates Google's custom silicon strategy Perhaps the most significant external validation of Ironwood's capabilities comes from Anthropic's commitment to access up to one million TPU chips -- a staggering figure in an industry where even clusters of 10,000 to 50,000 accelerators are considered massive. "Anthropic and Google have a longstanding partnership and this latest expansion will help us continue to grow the compute we need to define the frontier of AI," said Krishna Rao, Anthropic's chief financial officer, in the official partnership agreement. "Our customers -- from Fortune 500 companies to AI-native startups -- depend on Claude for their most important work, and this expanded capacity ensures we can meet our exponentially growing demand." According to a separate statement, Anthropic will have access to "well over a gigawatt of capacity coming online in 2026" -- enough electricity to power a small city. The company specifically cited TPUs' "price-performance and efficiency" as key factors in the decision, along with "existing experience in training and serving its models with TPUs." Industry analysts estimate that a commitment to access one million TPU chips, with associated infrastructure, networking, power, and cooling, likely represents a multi-year contract worth tens of billions of dollars -- among the largest known cloud infrastructure commitments in history. James Bradbury, Anthropic's head of compute, elaborated on the inference focus: "Ironwood's improvements in both inference performance and training scalability will help us scale efficiently while maintaining the speed and reliability our customers expect." Google's Axion processors target the computing workloads that make AI possible Alongside Ironwood, Google introduced expanded options for its Axion processor family -- custom Arm-based CPUs designed for general-purpose workloads that support AI applications but don't require specialized accelerators. The N4A instance type, now entering preview, targets what Google describes as "microservices, containerized applications, open-source databases, batch, data analytics, development environments, experimentation, data preparation and web serving jobs that make AI applications possible." The company claims N4A delivers up to 2X better price-performance than comparable current-generation x86-based virtual machines. Google is also previewing C4A metal, its first bare-metal Arm instance, which provides dedicated physical servers for specialized workloads such as Android development, automotive systems, and software with strict licensing requirements. The Axion strategy reflects a growing conviction that the future of computing infrastructure requires both specialized AI accelerators and highly efficient general-purpose processors. While a TPU handles the computationally intensive task of running an AI model, Axion-class processors manage data ingestion, preprocessing, application logic, API serving, and countless other tasks in a modern AI application stack. Early customer results suggest the approach delivers measurable economic benefits. Vimeo reported observing "a 30% improvement in performance for our core transcoding workload compared to comparable x86 VMs" in initial N4A tests. ZoomInfo measured "a 60% improvement in price-performance" for data processing pipelines running on Java services, according to Sergei Koren, the company's chief infrastructure architect. Software tools turn raw silicon performance into developer productivity Hardware performance means little if developers cannot easily harness it. Google emphasized that Ironwood and Axion are integrated into what it calls AI Hypercomputer -- "an integrated supercomputing system that brings together compute, networking, storage, and software to improve system-level performance and efficiency." According to an October 2025 IDC Business Value Snapshot study, AI Hypercomputer customers achieved on average 353% three-year return on investment, 28% lower IT costs, and 55% more efficient IT teams. Google disclosed several software enhancements designed to maximize Ironwood utilization. Google Kubernetes Engine now offers advanced maintenance and topology awareness for TPU clusters, enabling intelligent scheduling and highly resilient deployments. The company's open-source MaxText framework now supports advanced training techniques including Supervised Fine-Tuning and Generative Reinforcement Policy Optimization. Perhaps most significant for production deployments, Google's Inference Gateway intelligently load-balances requests across model servers to optimize critical metrics. According to Google, it can reduce time-to-first-token latency by 96% and serving costs by up to 30% through techniques like prefix-cache-aware routing. The Inference Gateway monitors key metrics including KV cache hits, GPU or TPU utilization, and request queue length, then routes incoming requests to the optimal replica. For conversational AI applications where multiple requests might share context, routing requests with shared prefixes to the same server instance can dramatically reduce redundant computation. The hidden challenge: powering and cooling one-megawatt server racks Behind these announcements lies a massive physical infrastructure challenge that Google addressed at the recent Open Compute Project EMEA Summit. The company disclosed that it's implementing +/-400 volt direct current power delivery capable of supporting up to one megawatt per rack -- a tenfold increase from typical deployments. "The AI era requires even greater power delivery capabilities," explained Madhusudan Iyengar and Amber Huffman, Google principal engineers, in an April 2025 blog post. "ML will require more than 500 kW per IT rack before 2030." Google is collaborating with Meta and Microsoft to standardize electrical and mechanical interfaces for high-voltage DC distribution. The company selected 400 VDC specifically to leverage the supply chain established by electric vehicles, "for greater economies of scale, more efficient manufacturing, and improved quality and scale." On cooling, Google revealed it will contribute its fifth-generation cooling distribution unit design to the Open Compute Project. The company has deployed liquid cooling "at GigaWatt scale across more than 2,000 TPU Pods in the past seven years" with fleet-wide availability of approximately 99.999%. Water can transport approximately 4,000 times more heat per unit volume than air for a given temperature change -- critical as individual AI accelerator chips increasingly dissipate 1,000 watts or more. Custom silicon gambit challenges Nvidia's AI accelerator dominance Google's announcements come as the AI infrastructure market reaches an inflection point. While Nvidia maintains overwhelming dominance in AI accelerators -- holding an estimated 80-95% market share -- cloud providers are increasingly investing in custom silicon to differentiate their offerings and improve unit economics. Amazon Web Services pioneered this approach with Graviton Arm-based CPUs and Inferentia / Trainium AI chips. Microsoft has developed Cobalt processors and is reportedly working on AI accelerators. Google now offers the most comprehensive custom silicon portfolio among major cloud providers. The strategy faces inherent challenges. Custom chip development requires enormous upfront investment -- often billions of dollars. The software ecosystem for specialized accelerators lags behind Nvidia's CUDA platform, which benefits from 15+ years of developer tools. And rapid AI model architecture evolution creates risk that custom silicon optimized for today's models becomes less relevant as new techniques emerge. Yet Google argues its approach delivers unique advantages. "This is how we built the first TPU ten years ago, which in turn unlocked the invention of the Transformer eight years ago -- the very architecture that powers most of modern AI," the company noted, referring to the seminal "Attention Is All You Need" paper from Google researchers in 2017. The argument is that tight integration -- "model research, software, and hardware development under one roof" -- enables optimizations impossible with off-the-shelf components. Beyond Anthropic, several other customers provided early feedback. Lightricks, which develops creative AI tools, reported that early Ironwood testing "makes us highly enthusiastic" about creating "more nuanced, precise, and higher-fidelity image and video generation for our millions of global customers," said Yoav HaCohen, the company's research director. Google's announcements raise questions that will play out over coming quarters. Can the industry sustain current infrastructure spending, with major AI companies collectively committing hundreds of billions of dollars? Will custom silicon prove economically superior to Nvidia GPUs? How will model architectures evolve? For now, Google appears committed to a strategy that has defined the company for decades: building custom infrastructure to enable applications impossible on commodity hardware, then making that infrastructure available to customers who want similar capabilities without the capital investment. As the AI industry transitions from research labs to production deployments serving billions of users, that infrastructure layer -- the silicon, software, networking, power, and cooling that make it all run -- may prove as important as the models themselves. And if Anthropic's willingness to commit to accessing up to one million chips is any indication, Google's bet on custom silicon designed specifically for the age of inference may be paying off just as demand reaches its inflection point.
[5]
Google's Ironwood TPU To be Generally Available in Coming Weeks | AIM
TPU v7 offers 10x peak performance improvement over TPU v5, and 4x better performance per chip for both training and inference workloads compared to TPU v6. Google announced that Ironwood, its seventh generation of TPUs (tensor processing units), will be made generally available in the coming few weeks. This means that Google Cloud customers will be able to utilise TPU v7 for their AI workloads. The chip is claimed to offer a ten-fold peak performance improvement over TPU v5, and 4x better performance per chip for both training and inference workloads compared to TPU v6. TPUs are chips that are specifically designed to handle AI workloads. Besides providing it for customers on Google Cloud, the company also uses it to train and deploy the Gemini, Imagen, Veo and other families of its AI models. Additionally, large-scale Google Cloud customers have also utilised TPUs for their AI workloads. Anthropic, the company behind the Claude family of AI models, has long utilised TPUs via Google Cloud for its workloads and has recently expanded its partnership with Google to deploy over 1 million new TPUs. Indian multinational conglomerate Reliance recently unveiled its latest venture, Reliance Intelligence, which will use Google Cloud infrastructure running on TPUs "With Ironwood, we can scale up to 9,216 chips in a superpod linked with breakthrough Inter-Chip Interconnect (ICI) networking at 9.6 Tb/s," said Google in the announcement. This also enables access to 1.77 petabytes (PBs) of shared High Bandwidth Memory (HBM). TPUs are also built to offer better performance efficiency compared to GPUs. A study from Google stated that TPU v4 is 1.2× to 1.7× faster than an NVIDIA A100 GPU (and uses 1.3× to 1.9× less power. Recently, Google announced a new research Project Suncatcher, which aims to explore the feasibility of scaling AI compute in space using solar-powered satellite constellations equipped with TPUs. According to D.A. Davidson analysts, cited by MarketWatch, combining Google's TPU business with its DeepMind AI research unit could be valued at around $900 billion. If Google eventually offers TPUs as hardware systems outside of Google Cloud, industry experts believe it could provide serious competition to the GPU market, including players like NVIDIA and AMD.
[6]
Google unleashes Ironwood TPUs, new Axion instances as AI inference demand surges - SiliconANGLE
Google unleashes Ironwood TPUs, new Axion instances as AI inference demand surges Google LLC today announced it's bringing its custom Ironwood chips online for cloud customers, unleashing tensor processing units that can scale up to 9,216 chips in a single pod to become the company's most powerful AI accelerator architecture to date. The new chips will be available to customers in the coming weeks, alongside new Arm-based Axion instances that promise up to twice the price-performance of current x86-based alternatives. Google's own frontier models, including Gemini, Veo and Imagen, are trained and deployed using TPUs, alongside equally sizable third-party models such as Anthropic PBC's Claude. The company said the advent of AI agents, which require deep reasoning and advanced task management, is defining a new era where inference -- the runtime intelligence of active models -- has greatly increased the demand for AI compute. The tech giant debuted Ironwood at Google Cloud Next 2025 in April and touted it as the most powerful TPU accelerator the company has ever built. The next-generation architecture allows the company to scale up to 9,216 chips in a single server pod, linked together with inter-chip interconnect to provide up to 9.6 terabits per second of bandwidth. They can be connected to a colossal 1.77 petabytes of shared high-bandwidth or HBM memory. Inter-chip interconnect, or ICI, acts as a "data highway" for chips, allowing them to think and act as a single AI accelerator brain. This is important because modern-day AI models require significant processing power, but they can't fit on single chips and must be split up across hundreds or thousands of processors for parallel processing. Just like thousands of buildings crammed together in a city, the biggest problem this kind of system faces is traffic congestion. With more bandwidth, they can talk faster and with less delay. HBM maintains the vast amount of real-time data AI models need to "remember" when training or processing queries from users. According to Google, the 1.77 petabytes of accessible data in a single, unified system is industry-leading. A single petabyte, or 1,000 terabytes, can represent around 40,000 high-definition Blu-ray movies or the text of millions of books. Making all of this accessible at once lets AI models respond instantly and intelligently with enormous amounts of knowledge. The company said the new Ironwood-based pod architecture can deliver more than 118x more FP8 ExaFLOPS than the nearest competitor and 4x better performance for training and inference than Trillium, the previous generation of TPU. Google included a new software layer on top of this advanced hardware co-designed to maximize Ironwood's capabilities and memory. This includes a new Cluster Director capability in Google Kubernetes Engine, which enables advanced maintenance and topology awareness for better process scheduling. For pretraining and post-training, the company announced enhancements to MaxText, a high-performance, open source large language model training framework for implementing reinforced learning techniques. Google also recently announced upgrades to vLLM to support inference switching between GPUs and TPUs, or a hybrid approach. Anthropic, an early user of Ironwood, said that the chips provided impressive price-performance gains, allowing them to serve massive Claude models at scale. The leading AI model developer and provider announced late last month that it plans to access up to 1 million TPUs. "Our customers, from Fortune 500 companies to startups, depend on Claude for their most critical work," Anthropic's Head of Compute James Bradbury said. "As demand continues to grow exponentially, we're increasing our compute resources as we push the boundaries of AI research and product development." Google also announced the expansion of its Axion offerings with two new services in preview: N4A, its second-generation Axion virtual machines, and C4A metal, the company's first Arm Ltd.-based bare-metal instances. Axion is the company's custom Arm-based central processing unit, designed to provide energy-efficient performance for general-purpose workloads. Google executives noted that the key to Axion's design philosophy is its compatibility with the company's workload-optimized infrastructure strategy. It uses Arm's expertise in efficient CPU design to deliver significant performance and power use enhancements over traditional x86 processors. "The Axion processors will have 30% higher performance than the fastest Arm processors available in the cloud today," Mark Lohmeyer, vice president and general manager of AI and computing infrastructure at Google Cloud, said in an exclusive broadcast on theCUBE, SiliconANGLE Media's livestreaming studio, during Google Cloud Next 2024. "They'll have 50% higher performance than comparable x86 generation processors and 60% better energy efficiency than comparable x86-based instances." Axion provides greatly increased efficiency for modern general-purpose AI workflows and it can be coupled with the new specialized Ironwood accelerators to handle complex model serving. The new Axion instances are designed to provide operational backbone, such as high-volume data preparation, ingestion, analytics and running the virtual services that host intelligent applications. N4A instances support up to 64 virtual CPUs and 512 gigabytes of DDR5 memory, with support for custom machine types. The new C4A metal delivers dedicated physical servers with up to 96 vCPUs and 768 gigabytes of memory. These two new services join the company's previously announced C4A instances designed for consistent high performance.
[7]
Google to offer Ironwood TPU for public use; Anthropic among first major clients
Ironwood, the seventh-generation tensor processing unit (TPU) launched by the search giant in April this year for testing and deployment, is built by linking up to 9,216 chips in one pod. It removes data bottlenecks to let customers run and scale the largest, most data-intensive models. Tech giant Google will release its specialised chip, called Ironwood, designed to run artificial intelligence (AI) models for public use in the coming weeks. Per a report by CNBC on Thursday, Google will also announce upgrades to Cloud, making it "cheaper, faster, and more flexible." The report added that AI startup Anthropic is also planning to use up to one million of the new TPUs to run its Claude model. Ironwood is the seventh-generation Tensor Processing Unit (TPU) launched by the search giant in April this year for testing and deployment. Google's in-house Ironwood TPU is built to support both training and real-time AI workloads, including chatbots and AI agents. By linking up to 9,216 chips in one pod, it removes data bottlenecks and lets customers run and scale the largest, most data-intensive models. Google competes with the likes of Microsoft, Amazon, and Meta to build next-generation AI infrastructure. While most major AI models still run on Nvidia GPUs, Google's custom TPU chips offer potential advantages in cost, performance, and efficiency, the company had said earlier in a blog post. Technical insights * It is an enormous cluster of up to 9,216 liquid-cooled chips working together as one single unit.* These chips are linked with inter-chip interconnect (ICI) networking that consumes 10 megawatts of power.* For customers, the chip is available in two scalable configurations: 256 chips or a full 9,216-chip cluster. Key features of Ironwood * Ironwood is built to handle the heavy computation and communication needs of advanced "thinking models" like large language models (LLMs), mixture of experts (MoE) and advanced reasoning systems.* It is capable of allowing AI workloads to run more cost-effectively. Google claims that Ironwood is nearly 30x more power efficient than its first Cloud TPU from 2018.* It offers 192 GB per chip, making it easier to process larger models and data sets, and is six times greater than Google's sixth-generation TPU called Trillium, announced last year. Google parent Alphabet reported its first-ever $100 billion quarterly revenue on October 30, led by strong growth across its core search business and rapidly expanding cloud division that was buoyed by AI. The company's ambitious approach to offering AI "is delivering strong momentum, and we're shipping at speed," CEO Sundar Pichai said.
[8]
Hot Take: The True AI Chip Challenge for NVIDIA Isn't from AMD or Intel -- It's Google's TPUs Heating Up the Race
NVIDIA's biggest competition in the AI industry, which has emerged as a formidable rival, is not currently AMD or Intel, but rather Google, which is catching up in the race. Interestingly, NVIDIA's CEO, Jensen Huang, is already aware of it. Well, this might seem a bit surprising at first, but Google is one of the earliest competitors in the race for AI hardware, introducing their very first TPU custom AI chip back in 2016, way earlier than AMD, NVIDIA, and Intel. The tech giant introduced its newest '7th-generation' Ironwood TPUs last week, which took the industry by storm and solidified the idea of 'NVIDIA vs Google' as the most competitive AI race. We'll discuss in depth why we say this, starting with how Google's latest Ironwood TPUs compare to those of NVIDIA. Let's talk about Google's Ironwood TPUs, which are now expected to be available across workloads in the coming weeks. The firm labels the chip as an 'inference-focused' option, claiming that it will bring in a new era of inferencing performance across general-purpose compute. The TPU v7 (Ironwood) is positioned to capitalize on the shift from model training to inference, which is why its onboard specifications are designed to ensure it excels in the "age of inference". Here are some key features: Now, diving into the specifications of the Ironwood chip, it is disclosed that Google plans to use 192 GB of 7.4 TB/s HBM memory and a massive 4,614 TFLOPs of peak FLOPs per chip, which is almost a 16x increase compared to TPUv4. More importantly, with Ironwood's TPU Superpod, the firm brings in 9,216 chips per pod, resulting in a cumulative performance of 42.5 exaFLOPS in aggregate FP8 compute workloads. The chip count with the SuperPod reveals that Google has an effective interconnect solution onboard, which has actually managed to surpass NVLink in terms of scalability. Speaking of interconnect, Google employs the InterChip Interconnect (ICI), a scale-up network. This network enables them to go all the way with 43 blocks (each block consists of 64 chips) of Superpods connected via a 1.8 Petabytes network. Internal communications are handled using a range of NICs, and Google utilizes a 3D Torus layout for its TPUs, which enables high-density interconnect across large numbers of chips. Compared to NVLink, scalability and interconnect density are where Google wins, which is why SuperPod is positioned to be a disruptive offering. Let's examine why Ironwood TPUs are expected to be significant in terms of the age of inference. But before that, it's important to note why 'thinking models' is the next big thing. Model training has been the dominant trend in the AI industry, which is why NVIDIA's compute portfolio was the primary option for Big Tech, as it offered better performance across scenarios that suited training environments. However, since mainstream models are now already deployed, the number of inference queries can vastly exceed the number of training tasks. Now, in inferencing, it's not just about getting the most 'TFLOPS' out there; instead, other metrics become pivotal, such as latency, throughput, efficiency, and cost per query, which is why when you look at what Google offers with Ironwood, the idea of 'Google excelling NVIDIA in the AI race' becomes a lot more evident. Firstly, Ironwood features massive on-package memory onboard, which is equivalent to NVIDIA's Blackwell B200 AI GPUs. However, when you factor in the SuperPod cluster having 9,216 chips in a single environment, the memory capacity available is significantly exceeded. Higher memory is significantly more critical for inference, as it reduces inter-chip communication overhead and improves latency for large models, which is one of the reasons why Ironwood is a more attractive option. Ironwood's architecture is explicitly designed for inference, which means that Google has specifically focused on ensuring low-latency, backed with high power efficiency, and we'll talk about power ahead, which is the 'second-most' important factor behind Ironwood's potential success. Inference on hyperscale means that you need thousands of chips to service inference queries, all within an environment that operates 24/7. CSPs tend to focus more on deployment and running costs than on the performance they are getting once inference comes in. This is why, with Ironwood, Google has achieved 2x higher power efficiency compared to previous generations, making the deployment of Google's TPUs across inference workloads a more sensible move. The race in AI is changing from "who has more flops" to "who can serve more queries, with lower latency, at lower cost, and with less power", which has opened up a new competition axis for NVIDIA that Google is looking to capture early on. More importantly, Ironwood is said to be offered exclusively under Google Cloud, which could create an ecosystem lock-in -- a potential fatal blow to Team Green's long-standing AI dominance. There's no doubt in the fact that Google's TPUs are proving to be competitive with each other iteration, and this should 'ring bells' within NVIDIA's camp. Sure, NVIDIA isn't staying silent with the dawn of inferencing, since with Rubin CPX, the firm plans to offer a 'sweet spot' with Rubin's rack-scale solutions, but it's evident with time that Google is positioning itself a the 'true rival' to NVIDIA, with Intel and AMD lagging behind for now. Oh, and here's what Jensen has said about Google's TPUs in the past, on the BG2 podcast. He certainly knows that Google's custom silicon is a competitive offering:
[9]
Google's Latest AI Chip Puts the Focus on Inference | The Motley Fool
Google announced on Thursday that Ironwood, its seventh-generation TPU, would be available for Google Cloud customers in the coming weeks. The company also disclosed that its new Arm-based Axion virtual machine instances are currently in preview, unlocking major improvements in performance per dollar. With these new cloud products, Google is aiming to lower costs for AI inference and agentic AI workloads. While Google's Ironwood TPU can handle AI training tasks, which involve ingesting massive amounts of data to train an AI model, it's also well suited for high-volume AI inference workloads. "It offers a 10X peak performance improvement over TPU v5p and more than 4X better performance per chip for both training and inference workloads compared to TPU v6e (Trillium), making Ironwood our most powerful and energy-efficient custom silicon to date," according to Google's blog post announcing the upcoming launch of Ironwood. While new AI models will still need to be trained, Google sees the balance shifting toward AI inference workloads. AI inference is the act of using a trained AI model to generate a response, and it's less computationally intensive than AI training workloads. However, AI chips meant for inference tasks need to have quick response times and be capable of handling a high volume of requests. Google is calling the new era in the AI industry the "age of inference," where organizations shift focus from training AI models to using those models to perform useful tasks. Agentic AI, the current buzzword in the industry, is ultimately just a string of AI inference tasks. Google expects near-exponential growth in demand for compute as AI is increasingly put to use. For AI companies like Anthropic, which recently signed a deal to expand its usage of Google's TPUs for both training and inference, efficiency is critical. Anthropic will have access to 1 million TPUs under the new deal, which will help it push toward its goal of growing revenue to $70 billion and becoming cash-flow-positive in 2028. The efficiency of Google's new TPUs were likely a key selling point. Google's cloud computing business has always lagged behind Microsoft Azure and Amazon Web Services, but AI could help the company catch up. Microsoft and Amazon are also aggressively building out AI computing capacity, and each also designs their own custom AI chips. Google Cloud, while smaller, is growing quickly and gaining ground on AWS. In the third quarter, Google Cloud produced revenue of $15.2 billion, up 34% year over year. The business also produced operating income of $3.6 billion, good for an operating margin of roughly 24%. Meanwhile, AWS grew revenue by 20% to $33 billion in the third quarter, while Azure and other cloud services for Microsoft grew by 40%. As more organizations move from experimenting with AI to deploying real AI workloads that demand significant amounts of AI inference capacity, Google is set to benefit with its massive fleet of TPUs. Google has been working on these chips for a decade, potentially giving it an edge as demand for AI computing capacity explodes.
Share
Share
Copy Link
Google Cloud introduces its most powerful AI infrastructure yet with Ironwood TPU v7 chips offering 4x performance gains and massive scaling capabilities up to 9,216 chips per pod. Anthropic commits to using up to 1 million TPUs in a multi-billion dollar deal.
Google Cloud has unveiled Ironwood, its seventh-generation Tensor Processing Unit (TPU), marking a significant leap in the company's custom silicon capabilities. The chip will become generally available in the coming weeks, representing Google's most ambitious effort yet to challenge Nvidia's dominance in the AI accelerator market
1
3
.
Source: VentureBeat
Ironwood delivers more than four times better performance for both training and inference workloads compared to its predecessor, TPU v6, and offers a ten-fold peak performance improvement over TPU v5
4
5
. Each Ironwood TPU boasts 4.6 petaFLOPS of dense FP8 performance, positioning it competitively against Nvidia's Blackwell GPUs at 4.5 petaFLOPS2
.The architecture's most striking feature is its unprecedented scale. A single Ironwood pod can connect up to 9,216 individual chips through Google's proprietary Inter-Chip Interconnect network operating at 9.6 terabits per second
1
. This massive interconnect fabric provides access to 1.77 petabytes of High Bandwidth Memory, delivering a total of 42.5 FP8 ExaFLOPS for training and inference1
.
Source: The Register
This scale far exceeds Nvidia's competing platforms. While Nvidia's GB300 NVL72 system delivers 0.36 ExaFLOPS, Google's Ironwood pods achieve 118 times more FP8 ExaFLOPS performance
1
4
. Google's Jupiter datacenter network technology could theoretically support compute clusters of up to 43 TPU v7 pods, encompassing roughly 400,000 accelerators2
.In a striking validation of Ironwood's capabilities, Anthropic has committed to accessing up to one million TPU chips, representing one of the largest known AI infrastructure deals worth tens of billions of dollars
3
4
. The AI safety company plans to use these TPUs to operate and expand its Claude model family, citing major cost-to-performance gains1
.
Source: AIM
Other companies are also adopting Google's platform. Lightricks has begun deploying Ironwood to train and serve its LTX-2 multimodal system, while Indian conglomerate Reliance recently unveiled Reliance Intelligence, which will utilize Google Cloud infrastructure running on TPUs
1
5
.Related Stories
Google employs a unique 3D torus topology for its TPU pods, where each chip connects to others in a three-dimensional mesh, eliminating the need for expensive, power-hungry packet switches
2
. While this approach may require more hops for chip-to-chip communication compared to Nvidia's switched topology, it enables the massive scaling capabilities that define Google's approach.To ensure reliability at this unprecedented scale, Google uses Optical Circuit Switching technology that acts as a dynamic, reconfigurable fabric
1
. When components fail, the system automatically reroutes data traffic around interruptions within milliseconds, maintaining continuous operation. Google reports fleet-wide uptime of approximately 99.999% for its liquid-cooled systems since 20204
.Summarized by
Navi
[1]
[2]
[4]
1
Business and Economy

2
Technology

3
Technology
