Curated by THEOUTPOST
On Wed, 4 Dec, 12:02 AM UTC
10 Sources
[1]
AWS building ExaFLOPS-class supercomputer for AI with hundreds of thousands homegrown Trainium2 processors -- AWS forges a path without Nvidia GPUs
When we write about AI supercomputers with tens or even hundreds of thousands of processors, we usually mean systems powered by Nvidia's Hopper or Blackwell GPUs. But Nvidia is not alone in addressing ultra-demanding supercomputers for AI, as Amazon Web Services this week said it is building a machine with hundreds of thousands of its Trainium2 processors to achieve performance of roughly 65 ExaFLOPS for AI. The company also unveiled its Trainium3 processor which will quadruple performance compared to the Trainium2. The AWS Trainium2 is Amazon's 2nd Generation AI accelerator designed for foundation models (FMs) and large language models (LLMs) and developed by Amazon's Annapurna Labs. The unit is a multi-tile system-in-package with two compute tiles, 96GB of HBM3 using four stacks, and two static chiplets for package uniformity. When AWS introduced Trainium2 last year, it did not share any specific performance figures for Trainium2 but stated that Trn2 instances could scale up to 100,000 processors, delivering 65 ExaFLOPS of low-precision compute performance for AI, which means that a single chip could deliver up to 650 TFLOPS. But it looks like that was a conservative estimate. At its re:Invent 2024 conference AWS made three Trainium2-related announcements: Firstly, the AWS Trainium2-based Amazon Elastic Compute Cloud (Amazon EC2) EC2 Trn2 instances are now generally available. These instances feature 16 Trainium2 processors interconnected with NeuronLink interconnection that deliver up to 20.8 FP8 PetaFLOPS of performance and 1.5 TB of HBM3 memory with a peak bandwidth of 46 TB/s. This essentially indicates that each Trainium2 offers up to 1.3 PetaFLOPS of FP8 performance for AI, which is twice as high compared to the figure discussed last year. Perhaps AWS found a way to optimize the processor's performance, or maybe it cited FP16 numbers previously, but 1.3 PetaFLOPS of FP8 performance is comparable to the Nvidia H100's FP8 performance of 1.98 PetaFLOPS (without sparsity). Second, AWS is building EC2 Trn2 UltraServers with 64 interconnected Trainium2 chips that offer 83.2 FP8 PetaFLOPS of performance as well as 6 TB of HBM3 memory with a peak bandwidth of 185 TB/s. The machines use 12.8 Tb/s Elastic Fabric Adapter (EFA) networking for interconnection. Finally, AWS and Anthropic are building a gigantic EC2 UltraCluster of Trn2 UltraServers, codenamed Project Rainier. The system will be powered by hundreds of thousands of Trainium2 processors that offer five times more ExaFLOPS performance than Anthropic currently uses to train its leading AI models, such as Sonnet and Opus. The machine is expected to be interconnected with third-generation, low-latency, petabit-scale EFA networking. AWS does not disclose how many Trainium2 processors the EC2 UltraCluster will use, but assuming that the maximum scalability of Trn2 instances is 100,000 processors, this suggests a system with performance of around 130 FP8 ExaFLOPS, which is quite massive and equals around 32,768 Nvidia H100 processors. "Trainium2 is purpose-built to support the largest, most cutting-edge generative AI workloads, for both training and inference, and to deliver the best price performance on AWS," said David Brown, vice president of Compute and Networking at AWS. "With models approaching trillions of parameters, we understand customers also need a novel approach to train and run these massive workloads. New Trn2 UltraServers offer the fastest training and inference performance on AWS and help organizations of all sizes train and deploy the world's largest models faster and at a lower cost." In addition, AWS has introduced its next-generation Trainium3 processor, which will be made on TSMC's 3nm-class process technology, deliver higher performance than its predecessors, and become available for AWS customers in 2025. Amazon expects Trn3 UltraServers to be four times faster than Trn2 UltraServers, achieving 332.9 FP8 PetaFLOPS per machine and 5.2 FP8 PetaFLOPS per processor if the number of processors remains at 64.
[2]
Amazon reveals next-gen AI silicon, turns Trainium2 loose
Tens of thousands of AWS' Trn2 instances to fuel Anthropic's next-gen models Re:Invent Amazon Web Services teased its next gen AI accelerator dubbed Trainium3 at re:Invent on Tuesday, which it says will deliver 4x higher performance than its predecessor when it arrives late next year. Details on the part are still quite thin, however, speaking with The Register ahead of re:Invent, Gadi Hutt, director of product and customer engineering for AWS' Annapurna Labs team, expects Trainium3 to be the first dedicated machine learning accelerator built on a 3nm process node and achieve a 40 percent improvement in efficiency compared to Trainium2, which a year after its own paper launch, is entering general availability -- more on that in a bit. In terms of performance, Amazon is vague about actual performance figures. Trainium3's 4x performance improvement is based on a complete "UltraServer" configuration, which we're told is still in development. What we do know is that the Trainium2 UltraServer, which features 64 accelerators in total, delivers 83.2 petaFLOPs of dense FP8 performance. So in theory, a Trainium3 UltraServer should deliver 332.8 petaFLOPS of compute, though it isn't clear at what precision. We've reached back out to AWS for clarification, but if we had to guess, we're probably looking at either 6-bit or 4-bit floating point math -- something that Nvidia is bringing to market with Blackwell and AMD plans to introduce with the MI355X sometime next year. Factor in sparsity, and Amazon's next-gen UltraServers could potentially deliver more than 1.3 exaFLOPS of AI compute, assuming Trainium3 also supports the same 4x multiplier as its processor. We've also been assured that these performance claims refer to peak compute performance -- aka FLOPS -- and not some nebulous AI benchmark. This is an important detail as depending on the AI workload, performance is dependent on a number of factors not just FLOPS. An increase in memory bandwidth, for instance, can result in large gains in large language model (LLM) inference performance, something we've previously seen with Nvidia's bandwidth boosted H200 chips. While Amazon is willing to tease performance and efficiency metrics, it has yet to share details on the chip's memory load out. If we had to guess, we'd get more detail on the part right around the time Amazon is ready to tease its next generation of AI ASICs. While we wait for more details on Trainium3, Amazon is bringing its second generation of Trainium compute services to the general market. Teased at re:Invent last year, Trainium2, which despite its name is actually both a training and inference chip, features 1.3 petaFLOPS of dense FP8 compute, 96 gigabytes of high-bandwidth memory capable of delivering 2.9 TBps of bandwidth apiece. For reference, a single Nvidia H100 boasts just under 2 petaFLOPS of dense FP8 performance, 80GB of HBM, and 3.35 TBps of bandwidth. The chip itself is composed of a pair of 5nm compute dies integrated using TSMC's chip-on-wafer-on-substrate (CoWoS) packaging tech along with four 24GB HBM stacks. Similar to Google's Tensor Processing Units (TPUs), these accelerators are bundled up into rack-scale clusters. Sixty-four Trainium2 parts spread across two inter-connected racks. As we mentioned earlier, this Trn2 UltraServer configuration is capable of churning out 83.2 petaFLOPS of dense FP8 performance or 332.8 petaFLOPS with its 4x sparsity mode enabled. If that's more compute than you're looking for, Amazon also offers a Trn2 instance with 16 accelerators and about 20.8 petaFLOPS of dense compute. According to Amazon, these instances offer 30 to 40 percent better price-performance over the current generation of GPU-based instances available on EC2 -- specifically its Nvidia H200-based P5e and P5en-based instances. For those using the chips to train models, Trainium2 can scale to even larger clusters with 100,000 or more chips. This is exactly what AWS and model builder Anthropic plan to do under Project Rainier, which will involve "hundreds of thousands" of Trainium2 chips producing "5x the number of exaFLOPS used to train their latest generation of AI models." Trn2 instances are now available in AWS' US East (Ohio) with availability in additional regions coming in the near future. Meanwhile the larger Trn2 UltraServer config is currently available in preview. While AWS' Annapurna Labs team pushes ahead with custom silicon, it isn't putting all of its eggs in one basket. The cloud giant already supports a wide variety of instances including H200, L40S, and L4 accelerators, and it is in the process of deploying a massive cluster of Blackwell parts under Project Ceiba. Based on Nvidia's Grace-Blackwell Superchips (GB200), the massive AI supercomputer will boast some 20,736 Blackwell GPUs, each connected by an 800 Gbps (1.6 Tbps per Superchip) of Elastic Fabric Adapter bandwidth. In total, the machine is expected to produce roughly 414 exaFLOPS of super low precision sparse FP4 compute. However, we'll note that precision will almost exclusively be used for inferencing, with higher precision FP8 and FP/BF used for training. For training, we expect Ceiba will still deliver a whopping 51 exaFLOPS dense BF16 compute or twice that if you're willing to drop down to FP8. In any case, while AWS may be pushing ahead with its Trainium silicon, it's by no means done with Nvidia just yet. ®
[3]
Amazon teases its next-gen Trainium3 AI accelerator is 4x faster than Trainium 3, drops in 2025
Amazon Web Services (AWS) has teased its next-gen Trainium3 AI accelerator at re:Invent on Tuesday, promising 4x higher performance than its current-gen Trainium2 chip. The new Trainium3 AI accelerator is due in late 2025, with Gadi Hutt, director of product and customer engineering for AWS' Annapurna Labs team, expects the new AI accelerator to be the very first dedicated machine learning accelerator built on a 3nm process node (at TSMC) and hit a 40% improvement in efficiency over Trainium2. Amazon hasn't been too clear on the exact performance of its Trainium3, but the 4x performance improvement figure is based on AWS' complete "UltraServer" configuration, which The Register reports is still in development. The outlet works out that the Trainium2 UltraServer features 64 accelerators, capable of 83.2 petaFLOPS of compute performance (unknown precision). In theory, explains The Register, we could see the new AWS Trainium3 AI accelerators in a next-gen UltraServer, pushing out 332.8 petaFLOPS of compute power, but once again, we don't know at what precision (an important part, especially when comparing against other AI accelerators rand AI GPUs from the likes of NVIDIA, AMD, and others in the AI chip space). What would AWS possibly use its new Trainium3 AI accelerators for? Well, the company is expected to push out a new text-to-video model codenamed "Olympus" which is rumored to be unveiled any day now, more on that here.
[4]
AWS unveils next-gen Trainium3 custom AI chips and cloud Trainium2 instances - SiliconANGLE
AWS unveils next-gen Trainium3 custom AI chips and cloud Trainium2 instances Amazon Web Services Inc. today unveiled its next-generation custom chip for high-efficiency artificial intelligence training and delivery with Trainium3 and announced the general availability of AWS Trainium2-powered cloud instances that will put high performance AI capabilities in the hands of customers. Amazon revealed Trainium3 today during AWS re:Invent, the company's annual conference on cloud computing, saying that it will be the first AWS chip made with a three-nanometer process, becoming a new standard for power efficiency and density. The chips will provide two times more performance and 40% better energy efficiency than the current Trainium2 chips. The Trainium family of custom silicon by AWS allow enterprise businesses to keep up with the rapidly increasing size of AI foundation models and large language models behind today's generative AI applications. As they increase in size, they require increased processing power to deal with massive datasets for training and deployment. The largest and most advanced models can scale from hundreds of billions to trillions of data points. To assist with the training and deployment of these growing models, Amazon announced the general availability of Elastic Compute Cloud Trn2 instances featuring 16 Trainium2 chips that will provide 20.8 petaflops of compute in peak performance. The company said these Trn2 instances offer 30% more compute and 25% more high bandwidth memory than the next most powerful EC2 instances for the same cost. In testing, Meta Platforms Inc.'s Llama 405B, a model with 405 billion parameters, or data points that it be customized with, delivered more than three times higher token-generation throughput using Trn2 EC2 instances on Amazon Bedrock compared to similar offerings by rival major cloud providers. Token generation is done when a large language model is deployed and providing text answers to questions, the higher the throughput the faster it can produce answers, summarize documents and generate responses. For LLMs that scale even bigger in size, Amazon is releasing a second tier Trianium2 instance called Trn2 UltraServers that will allow customers to go beyond the limits of s single Trn2 server. In an interview with SiliconANGLE, Gadi Hutt, senior director business development at Annapurna Labs, the subsidiary of AWS that designs and builds the company's custom chips, said this will allow customers to reduce training time, get to market faster and improve model accuracy. "Next, we break that [16-chip] boundary and provide 64 chips in the UltraServer and that is for extremely large models," said Hutt. "So if you have a 7 billion-parameter model, that used to be large, but not anymore -- or an extremely large model let's call it 200 billion or 400 billion. You want to serve at the fastest latency possible. So, you use the UltraServer." Hutt explained the new Trn2 UltraServers use NeuronLink interconnect to hook up four Trn2 servers into one giant server. This allows customers to scale up workloads across all four, providing 64 Trainium2 chips at once for AI model training or inference. An UltraServer can deliver up to 83.2 peak petaflops of compute, this can provide enough compute power to serve trillion-parameter models in production. Amazon said UltraServers built on the upcoming Trainium3 are expected to deliver four times the performance of Trn2 UltraServers, which will allow for superior real-time performance for training and deploying extremely-large AI models. The first Trainium3-based instances are expected to be available in 2025. In the same vein of ultra-large models, AWS is working with its partner Anthropic PBC to build an EC2 UltraCluster of Trn2 UltraServers named Project Ranier. Hutt said that this cluster of UltraServers will comprise hundreds of thousands of Trainium2 chips interconnected together with third-generation, low-latency networking. The intent is to provide enough scaled distributed compute power to train the company's next-generation large language model. "This is by far the largest cluster we've built," Hutt said. "Compared to what Anthropic had until today, it is five times larger." Anthropic introduced its most advanced LLM, Claude 3.5 Sonnet, in October. The company has continuously released upgraded models with enhanced capabilities on a regular drumbeat. Engineering, training and deploying these models takes a significant amount of computing power for the company to stay competitive against its rivals such as OpenAI and Google LLC. Amazon recently announced plans to double its investment in Anthropic to $8 billion and tightened the company's partnership with the AI model provider.
[5]
Amazon's next-generation AI training chip is here
Amazon Web Services (AMZN+1.15%) unveiled its next-generation artificial intelligence training chip that it says is faster and expected to use less energy. Trainium3 is the first chip from AWS built with the 3-nanometer process -- so far the most advanced semiconductor technology -- that allows for better performance and power efficiency. The first Trainium3 chips are expected to be available late next year, AWS announced at its re:Invent conference on Tuesday. UltraServers powered with Trainium3 are expected to perform four times better than those powered by its Trainium2 chips, AWS said, "allowing customers to iterate even faster when building models and deliver superior real-time performance when deploying them." The cloud giant's Trainium2 chips, which are four times faster than its predecessor, are now generally available, AWS said. The Trainium2-powered Amazon Elastic Compute Cloud (Amazon EC2) instances allow for 30% to 40% better price performance than current chips, and feature 16 Trainium2 chips. The new Amazon EC2 instances are "ideal for training and deploying LLMs with billions of parameters," AWS said. The cloud giant said it is building an EC2 UltraCluster of Trainium2-powered UltraServers with AI startup Anthropic called Project Rainier. In November, AWS announced it was following up a previous $4 billion investment in the AI startup with another $4 billion. In the next phase of the partnership, Anthropic will use AWS as its primary AI training partner. "Trainium2 is purpose built to support the largest, most cutting-edge generative AI workloads, for both training and inference, and to deliver the best price performance on AWS," David Brown, vice president of Compute and Networking at AWS, said in a statement. "With models approaching trillions of parameters, we understand customers also need a novel approach to train and run these massive workloads. New Trn2 UltraServers offer the fastest training and inference performance on AWS and help organizations of all sizes to train and deploy the world's largest models faster and at a lower cost." AWS chief executive Matt Garman also announced the next-generation P6 family of instances from Nvidia (NVDA+0.48%) and AWS, featuring the chipmaker's new Blackwell chips. Blackwell has 2.5 times faster compute than current generation of graphics processing units, or GPUs, Garman said.
[6]
Amazon AWS unveils Trainium3 chip, Project Rainier
At its annual re:Invent conference in Las Vegas, Monday, Amazon's AWS cloud computing service disclosed the third generation of its Trainium computer chip for training large language models (LLMs) and other forms of artificial intelligence (AI), a year after the debut of the second version of the chip. The new Trainium3 chip, which will become available next year, will be up to twice as fast as the existing Trainium2 while being 40% more energy-efficient, said AWS CEO Matt Garman during his keynote on Tuesday. Also: AWS says its AI data centers just got even more efficient - here's how Trainum3 is the first chip from AWS to use a three-nanometer semiconductor manufacturing process technology. In the meantime, the Trainium2 chips unveiled a year ago are now generally available, said Garman. The chips are four times faster than the previous generation. The chips are geared toward LLM training, and Garman emphasized performance on Meta Platforms's popular open-source model, Llama. "Independent inference performance tests for Meta's Llama 405B showed that Amazon Bedrock, running on Trn2 instances, delivers more than 3x higher token-generation throughput compared to other available offerings by major cloud providers," the company says. Also: The best web hosting services: Expert tested and reviewed Amazon also announced UltraServers, a new offering for AWS's Elastic Compute Cloud service that connects 64 of the current Trainium2 chips "into one giant server", using NeuronLink interconnections. The servers are available now on EC2. The UltraServer is designed to handle LLMs with trillions of parameters, said Amazon. To aid development for the Trainium parts, the company rolled out a software development kit, known as Neuron, that includes a compiler, runtime libraries, and tools optimized for Trainium. Neuron has native support for "popular frameworks" in AI such as JAX and PyTorch, and "over 100,000 models on the Hugging Face model hub". Also: I changed 5 ChatGPT settings and instantly became more productive - here's how Garman also gave a sneak peek at future developments. New versions of the UltraServers running Trainium3 are expected to be four times "more performant" than the Trainium2-based UltraServers, "allowing customers to iterate even faster when building models and deliver superior real-time performance when deploying them." The company said work is underway to build "Project Rainier", which would be an "UltraCluster" grouping numerous UltraServers to allow access to "hundreds of thousands of Trainium2 chips". Also: The best AI for coding in 2024 (and what not to use) The UltraCluster is being developed in partnership with Gen AI startup Anthropic.
[7]
AWS' Trainium2 chips for building LLMs are now generally available, with Trainium3 coming in late 2025 | TechCrunch
At its re:Invent conference, AWS today announced the general availably of its Trainium2 (T2) chips for training and deploying large language models (LLMs). These chips, which AWS first announced a year ago, will be four times as fast as their predecessors, with a single Trainium2-powered EC2 instance with 16 T2 chips providing up to 20.8 petaflops of compute performance. In practice, that means running inference for Meta's massive Llama 405B model as part of Amazon's Bedrock LLM platform will be able to offer "3x higher token-generation throughput compared to other available offerings by major cloud providers," according to AWS. These new chips will also be deployed in what AWS calls the 'EC2 Trn2 UltraServers.' These instances will feature 64 interconnected Trainium2 chips which can scale up to 83.2 peak petaflops of compute. An AWS spokesperson informed us that these performance numbers of 20.8 petaflops are for dense models and FP8 precision. The 83.2 petaflops value is for FP8 with sparse models. AWS notes that these UltraServers use a NeuronLink interconnect to link all of these Trainium chips together. The company is working with Anthropic, the LLM provider AWS has put its (financial) bets on, to build a massive cluster of these UltraServers with "hundreds of thousands of Trainium2 chips" to train Anthropics models. This new cluster, AWS says, will be 5x as powerful (in terms of exaflops of compute) compared to the cluster Anthropic used to train its current generation of models and, AWS also notes, "is expected to be the world's largest AI compute cluster reported to date." Overall, those specs are an improvement over Nvidia's current generation of GPUs, which remain in high demand and short supply. They are dwarfed, however, by what Nvidia has promised for its next-gen Blackwell chips (with up to 720 petaflops of FP8 performance in a rack with 72 Blackwell GPUs), which should arrive -- after a bit of a delay -- early next year. Maybe that's why AWS also used this moment to immediately announce its next generation of chips, too, the Trainium3. For Trainium3, AWS expects another 4x performance gain for its UltraServers, for example, and it promises to deliver this next iteration, built on a 3-nanometer process, in late 2025. That's a very fast release cycle, though it remains to be seen how long the Trainium3 chips will remain in preview and when they'll also get into the hands of developers. "Trainium2 is the highest performing AWS chip created to date," said David Brown, vice president of Compute and Networking at AWS, in the announcement. "And with models approaching trillions of parameters, we knew customers would need a novel approach to train and run those massive models. The new Trn2 UltraServers offer the fastest training and inference performance on AWS for the world's largest models. And with our third-generation Trainium3 chips, we will enable customers to build bigger models faster and deliver superior real-time performance when deploying them." The Trn2 instances are now generally available in AWS' US East (Ohio) region (with other regions launching soon), while the UltraServers are currently in preview.
[8]
AWS Unveils Trainium2, Slashes AI Cost by 40%
The cloud giant also introduced Trn2 UltraServers and unveiled its next-generation Trainium3 AI chip. At AWS re:Invent 2024, Amazon Web Services (AWS) has announced the general availability of AWS Trainium2-powered Amazon Elastic Compute Cloud (EC2) instances. The new instances offer 30-40% better price performance than the previous generation of GPU-based EC2 instances. "Today, I'm excited to announce the GA of Trainium2-powered Amazon EC2 Trn2 instances," said AWS chief Matt Garman. In addition to this, the company also introduced Trn2 UltraServers and unveiled its next-generation Trainium3 AI chip. The Trn2 instances are built with 16 Trainium2 chips, delivering up to 20.8 petaflops of compute performance. They are intended for training and deploying large language models (LLMs) with billions of parameters. Trn2 UltraServers combine four Trn2 servers into a single system, offering 83.2 petaflops of compute for higher scalability. These new UltraServers feature 64 interconnected Trainium2 chips. "The launch of Trainium2 instances and Trn2 UltraServers provides customers with the computational power needed to tackle the most complex AI models, whether for training or inference," said David Brown, AWS vice president of compute and networking. AWS is working with Anthropic to create Project Rainier, a large-scale AI compute cluster powered by hundreds of thousands of Trainium2 chips. This infrastructure will support Anthropic's model development, including the optimisation of its flagship product, Claude, to run on Trainium2 hardware. Databricks and Hugging Face have partnered with AWS to leverage Trainium2's capabilities for improved performance and cost efficiency in their AI offerings. Databricks plans to utilise the hardware to enhance the Mosaic AI platform, while Hugging Face integrates Trainium2 into its AI development and deployment tools. Other customers of Trainium2 include Adobe, Poolside, and Qualcomm. "Adobe is seeing very promising early testing after running Trainium2 against their Firefly inference model, and they expect to save significant amounts of money," said Garman. "Poolside expects to save 40% compared to alternative options," he added. "Qualcomm is using Trainium2 to deliver AI systems that can train in the cloud and then deploy at the edge." AWS also previewed its Trainium3 chip, built using a 3-nanometer process node. Trainium3-powered UltraServers are expected in late 2025 and aim to deliver four times the performance of Trn2 UltraServers. To optimise the use of Trainium hardware, AWS also introduced the Neuron SDK, a suite of software tools that enables developers to optimize their models for peak performance on Trainium chips. The SDK supports frameworks such as JAX and PyTorch, allowing customers to integrate the software into their existing workflows with minimal code changes. The Neuron SDK also supports over 100,000 models hosted on the Hugging Face model hub, further enhancing its accessibility for AI developers. Trn2 instances are currently available in the US East (Ohio) region, with expansion to additional regions planned. UltraServers are in preview mode.
[9]
New Amazon Trainium 2 AI Chip : Amazon's Bold Move to Take on NVIDIA
Amazon has introduced its third-generation AI chip, Trainium 2, marking a significant move to challenge NVIDIA's dominance in the AI hardware market. With a focus on delivering exceptional performance and cost efficiency, Trainium 2 is designed to address the growing demand for advanced AI computing solutions. By using its extensive cloud infrastructure, fostering strategic collaborations, and maintaining a culture of innovation, Amazon is positioning itself as a key player in this rapidly evolving industry. For years, NVIDIA has been the undisputed leader in this space, providing the chips that fuel much of the AI revolution. But now, Amazon is stepping into the ring. But let's be real -- taking on a giant like NVIDIA is no small feat. Amazon is betting big on Trainium 2, promising not just raw power but also cost efficiency and scalability that could make AI more accessible to businesses of all sizes. Yet, there's more to this story than just hardware specs. From fostering a scrappy, startup-like culture within its massive corporate structure to tackling the steep challenge of building a user-friendly software ecosystem, Amazon's approach is as ambitious as it is complex. So, what makes Trainium 2 stand out, and can Amazon truly carve out its place in this high-stakes race? Trainium 2 represents a substantial advancement in AI chip technology, offering a combination of performance, scalability, and cost efficiency that makes it a compelling choice for enterprises. Key features of Trainium 2 include: These features address the increasing demand for high-performance AI solutions, offering enterprises a powerful tool to optimize their AI operations while managing costs effectively. Trainium 2's scalability is particularly noteworthy, as it enables organizations to tackle complex AI challenges with greater efficiency. The development of Trainium 2 is deeply rooted in Amazon's culture of rapid innovation. The engineering team responsible for the chip operates out of a utilitarian lab in Austin, Texas, where flexibility and speed are prioritized. Despite being part of a $2 trillion corporation, the team functions with the agility of a startup, focusing on iterative development and practical problem-solving. This approach allows Amazon to remain competitive in the fast-paced AI hardware market. By fostering a culture that encourages experimentation and adaptability, Amazon can respond quickly to industry demands and technological advancements, making sure that Trainium 2 meets the needs of its users. Amazon is actively deploying Trainium 2 within its own AI operations while also collaborating with key partners to refine and showcase the chip's capabilities. These partnerships play a crucial role in demonstrating the real-world potential of Trainium 2 and driving its adoption across various industries. Notable collaborations include: These strategic alliances not only enhance the functionality of Trainium 2 but also reduce Amazon's reliance on NVIDIA's hardware, positioning the company as a more self-reliant and competitive player in the AI hardware market. While Trainium 2's hardware capabilities are impressive, Amazon faces a significant challenge in building a robust software ecosystem to support its adoption. NVIDIA's CUDA platform has long been the industry standard, offering developers a mature, user-friendly, and flexible environment. In contrast, Amazon's Neuron SDK is still in its early stages and requires substantial improvements to attract developers and compete effectively. Switching to a new hardware platform also involves considerable costs and time investments, which can deter companies already entrenched in NVIDIA's ecosystem. To overcome these hurdles, Amazon must prioritize the development of its software tools and provide comprehensive support to developers, making sure a smooth transition to Trainium 2. The AI chip market is becoming increasingly competitive, with several players vying for dominance. NVIDIA's leadership is built on its advanced hardware and robust software ecosystem, but supply chain constraints have created opportunities for competitors to gain ground. Amazon is not alone in this pursuit -- other cloud providers, such as Microsoft and Google, are also developing custom AI chips to reduce their reliance on NVIDIA. Trainium 2 represents Amazon's strategic response to this competitive landscape. By offering a high-performance, cost-efficient alternative, Amazon aims to carve out a significant share of the AI hardware market. However, success will depend on its ability to address software challenges and demonstrate the value of its solutions to potential customers. Amazon's long-term vision centers on creating a comprehensive AI ecosystem through AWS. By treating the entire data center as a unified computing system, Amazon seeks to optimize performance and efficiency for its customers. This approach aligns with its broader strategy of becoming an "AI supermarket," offering a wide range of tools and services to support AI development. While maintaining a working relationship with NVIDIA, Amazon is gradually reducing its dependency by investing heavily in its own hardware and software solutions. This dual strategy allows Amazon to use the strengths of its existing partnerships while building a foundation for greater independence in the future. Amazon has outlined an ambitious roadmap for its AI hardware development, with plans to release new chip generations every 18 months. This aggressive timeline underscores Amazon's commitment to innovation and its determination to establish itself as a leader in the AI hardware market. Reliability and quality control remain top priorities, making sure that Trainium 2 and its successors meet the high standards expected by customers. As Amazon continues to refine its hardware and software offerings, it is well-positioned to play a significant role in shaping the future of AI computing. By focusing on performance, cost efficiency, and strategic partnerships, Amazon is poised to challenge the status quo and drive meaningful advancements in the AI hardware space.
[10]
Dave Brown Talks Trainium 2: AWS's Secret Weapon for Generative AI Leadership - SiliconANGLE
Dave Brown Talks Trainium 2: AWS's Secret Weapon for Generative AI Leadership Prior to re:Invent, Dave Brown, Vice President of Compute at AWS, shared a glimpse with me into the future of cloud computing. During an exclusive podcast interview, Brown unveiled the company's newest silicon innovation -- Trainium 2 -- and delved into how AWS is redefining the infrastructure landscape to meet the burgeoning demands of generative AI. For years, AWS has been a driving force behind enterprise cloud computing, but as generative AI reshapes industries, the stakes have never been higher. Trainium 2 exemplifies the infrastructure innovation that AWS has cultivated, promising a blend of raw performance and price efficiency that Brown believes will revolutionize AI workloads for enterprises of all sizes. According to Brown, AWS's foray into custom silicon began with a simple yet powerful question: How can cloud providers maximize performance while controlling costs? Trainium 2 is the latest answer. Purpose-built for AI and machine learning workloads, the new chip delivers an impressive fourfold performance improvement over its predecessor, Trainium 1. Brown emphasized its importance, stating, "Generative AI is transformative, but for it to scale, price performance must be prioritized." Each TRN2 instance boasts 16 Trainium 2 chips interconnected via AWS's proprietary NeuronLink protocol. This configuration allows workloads to utilize high-bandwidth memory and unified memory access across accelerators, enabling large-scale AI models to perform at unprecedented speeds. "This chip is our most advanced yet," Brown said. "It's designed to tackle the immense computational requirements of generative AI while keeping costs manageable." Early adopters such as Anthropic and Adobe have already integrated Trainium 2 into their operations, leveraging its 30-40% price-performance advantage over competing accelerators. "When you're training large language models with thousands of chips, a 40% savings can mean millions of dollars," Brown noted. The AI revolution has created a renaissance in high-performance computing (HPC), an area traditionally dominated by elite industries like aerospace and defense. With the benefits of speed and cost-efficiency. AWS is democratizing access to supercomputing resources. According to Brown, a cornerstone of this effort is its Capacity Blocks offering, which allows customers to reserve compute resources for short-term projects. Brown explained, "Instead of committing to hardware for years, enterprises can access cutting-edge chips like Trainium 2 for a week or even a single day." Capacity Blocks have opened the door for startups and enterprises alike to explore ambitious projects, from indexing vast data lakes to training proprietary models. "What used to take months and millions of dollars is now accessible to companies of all sizes," Brown said. "That's the true promise of cloud computing." AWS's layered approach to infrastructure ensures flexibility for diverse customer needs. At the foundational level, SageMaker simplifies machine learning operations by acting as an orchestrator for compute jobs. Brown described SageMaker as "mission control," managing node failures and optimizing clusters for training and inference workloads. For developers and enterprises seeking rapid deployment, Bedrock offers an abstraction layer for foundational AI models like Llama and Anthropic's Claude. This stack allows AWS to cater to a wide spectrum of use cases. "SageMaker is ideal for those who need granular control, while Bedrock abstracts complexity, letting users focus on innovation rather than infrastructure," Brown said. "It's about meeting customers where they are in their AI journey." AWS's investment in custom silicon isn't just a technological differentiator -- it's a strategic necessity. The company's partnerships with industry leaders like NVIDIA complement its in-house innovations, creating a versatile ecosystem. Brown highlighted Project Ceiba, a 20,000-GPU cluster built in collaboration with NVIDIA. "Our goal is to make AWS the best place to run NVIDIA hardware while continuing to innovate with our own silicon," he said. AWS's partnership with Anthropic highlights the transformative potential of Trainium 2 infrastructure. Brown revealed that AWS is building a groundbreaking cluster of Trn2 UltraServers for Anthropic, containing hundreds of thousands of Trainium 2 chips. According to Brown, this cluster delivers over five times the exaflops of computational power used to train Anthropic's current generation of AI models. Leveraging AWS's elastic fabric adapter network, the tightly coupled design ensures unparalleled efficiency and scalability, crucial for training large language models. "A 40% cost savings on a cluster of this magnitude is incredibly significant," Brown emphasized. This unique integration highlights how AWS's next-generation infrastructure drives differentiation with partners like Anthropic to push the boundaries of what's possible in AI development, making breakthroughs more accessible and cost-effective for enterprises globally. Yet, AWS's commitment to hardware goes beyond collaboration. The Trainium and Graviton chip families illustrate how the company has steadily refined its silicon expertise. Brown traced this evolution back to the company's 2015 acquisition of Annapurna Labs, calling it "one of the most transformative deals in the industry." Building and maintaining high-performance compute systems is no small feat. AWS has embraced innovations like water cooling in its data centers to accommodate the thermal demands of modern accelerators. Brown explained, "When chips consume over 1,000 watts per accelerator, traditional air cooling just doesn't cut it." Operational challenges extend beyond cooling. The scale at which AWS operates allows the company to identify and resolve hardware faults that smaller data centers might never encounter. "At our scale, we're able to fix issues proactively, ensuring stability and performance for our customers," Brown said. While generative AI has captured the spotlight, Brown is quick to point out that AWS's innovation extends across the Compute stack. Kubernetes, often described as "the new Linux," remains a focus, with AWS introducing new features to simplify container orchestration. "Generative AI is exciting, but we're also pushing the envelope in other areas of infrastructure," Brown said. Looking ahead, AWS plans to continue its rapid pace of innovation. Brown hinted at the development of Trainium 3, which promises even greater performance gains. "We're just scratching the surface of what's possible," he said. AWS's advancements are not just technical achievements -- they're a blueprint for the future of cloud computing. Trainium 2, SageMaker, Bedrock, and Capacity Blocks collectively lower the barriers to entry for enterprises seeking to harness AI. Brown's advice to customers is simple: "Get hands on keyboard. Start small, experiment, and scale from there." AWS's Compute division is navigating a pivotal moment in the tech industry. With generative AI redefining what's possible, the company's investments in custom silicon, scalable infrastructure, and customer-centric solutions give AWS a strong hand in leading the next wave of cloud innovation.
Share
Share
Copy Link
Amazon Web Services announces its next-generation AI chip, Trainium3, promising 4x performance boost over Trainium2. The company also launches Trainium2-powered cloud instances for high-performance AI computing.
Amazon Web Services (AWS) has announced its next-generation AI accelerator, Trainium3, at the re:Invent conference. Set to launch in late 2025, Trainium3 promises significant advancements in AI computing capabilities [1][2].
Key features of Trainium3 include:
While Trainium3 is on the horizon, AWS has made Trainium2-powered cloud instances generally available:
AWS is pushing the boundaries of AI computing with larger configurations:
The introduction of Trainium2 and Trainium3 positions AWS as a strong competitor in the AI chip market:
AWS is not solely relying on its custom silicon:
The development of Trainium3 and the scaling of Trainium2 instances signify AWS's commitment to advancing AI computing capabilities:
As the AI chip race intensifies, AWS's innovations in custom silicon and cloud infrastructure are poised to play a crucial role in shaping the future of AI computing and applications.
Reference
[1]
[2]
[3]
[4]
Amazon is set to launch its next-generation AI chip, Trainium 2, aiming to reduce reliance on Nvidia and cut costs for AWS customers. The chip, developed by Amazon's Annapurna Labs, is already being tested by major players in the AI industry.
9 Sources
Amazon Web Services unveils new AI chip clusters and supercomputers, shifting focus to Trainium chips to compete with Nvidia in the AI hardware market.
11 Sources
Amazon is accelerating the development of its Trainium2 AI chip to compete with Nvidia in the $100 billion AI chip market, aiming to reduce reliance on external suppliers and offer cost-effective alternatives for cloud services and AI startups.
4 Sources
Apple reveals its use of Amazon Web Services' custom AI chips for services like search and considers using Trainium2 for pre-training AI models, potentially improving efficiency by up to 50%.
13 Sources
Amazon Web Services launches the "Build on Trainium" program, offering $110 million in grants and compute credits to academic researchers for AI development using its custom Trainium chips.
4 Sources
The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2025 TheOutpost.AI All rights reserved