21 Sources
21 Sources
[1]
Amazon challenges competitors with on-premises Nvidia 'AI Factories' | TechCrunch
Amazon announced a new product Tuesday called "AI Factories" that allows big corporations and governments to run its AI systems in their own data centers. Or as AWS puts it: customers supply the power and the data center, and AWS plunks in the AI system, manages it, and can tie it into other AWS cloud services. The idea is to cater to companies and governments concerned with data sovereignty, or absolute control over their data so it can't wind up in a competitor's or foreign adversary's hands. An on-prem AI Factory means not sending their data to a model maker, and not even sharing the hardware. If that product name sounds familiar, it should. That's what Nvidia calls its hardware systems that are chock full of tools needed to run AI, from its GPU chips to its networking tech. This AWS AI Factory is, in fact, a collaboration with Nvidia, both companies say. In this case, the AWS Factory will use a combination of AWS and Nvidia technology. Companies that deploy these systems can opt for Nvidia's latest Blackwell GPUs or Amazon's new Trainium3 chip. It uses AWS' homegrown networking, storage, databases and security and can tap into Amazon Bedrock -- the AI model selection and management service, and AWS SageMaker AI, the model building and training tool. Interestingly, AWS is far from the only giant cloud provider installing Nvidia AI Factories. In October, Microsoft showed off its first of many-to-come AI Factories rolling out into its global data centers to run OpenAI workloads. Microsoft didn't announce at the time that these extreme machines would be available for private clouds. Instead, Microsoft highlighted how it was leaning on a host of Nvidia AI Factory data center tech to build and connect its new "AI Superfactories," aka new state-of-the-art data centers being built in Wisconsin and Georgia. Last month, Microsoft also outlined the data centers and cloud services that would be built in local countries to address the data sovereignty issue. To be fair, its options also include "Azure Local," Microsoft's own managed hardware that could be installed on customer sites. Still, it is a bit ironic that AI is causing the biggest cloud providers to invest so heavily in corporate private data centers and hybrid clouds like it's 2009 all over again.
[2]
Amazon releases an impressive new AI chip and teases a Nvidia-friendly roadmap | TechCrunch
Amazon Web Services, which has been building its own AI training chips for years now, just introduced a new version known as Trainium3 that comes with some impressive specs. The cloud provider, which made the announcement Tuesday at its AWS re:Invent 2025, also teased the next product in on its AI training product roadmap: Trainium4, which is already in the works and will be able to work with Nvidia's chips. AWS used its annual tech conference to formally launch Trainium3 UltraServer, a system powered by the company's state-of-the art, 3 nanometer Trainium3 chip, as well as its homegrown networking tech. As you might expect, the third-generation chip and system offer big bumps in performance for AI training and inference over the second-generation, according to AWS. AWS says the systems are more than four times faster, with four times more memory, not just for training, but for delivering AI apps at peak demand. Additionally, thousands of UltraServers can be linked together to provide an app with up to 1 million Trainium3 chips -- 10 times the previous generation. Each UltraServer can host 144 chips, according to the company. Perhaps more importantly, AWS says the chips and systems are also 40% more energy efficient than the previous generation. While the world races to build bigger data centers powered by astronomical gigawatts of electricity, data center giant AWS is trying to make systems that drink less, not more. It is, obviously, in AWS's direct interests to do so. But in its classic, Amazon cost-conscience way, it promises that these systems save its AI cloud customers money, too. AWS customers like Anthropic (of which Amazon is also an investor), Japan's LLM Karakuri, Splashmusic, and Decart have already been using the third-gen chip and system and significantly cut their inference costs, Amazon said. AWS also presented a bit of a roadmap for the next chip, Trainium4, which is already in development. AWS promised the chip will provide another big step-up in performance and support Nvidia's NVLink Fusion high-speed chip interconnect technology. This means the AWS Trainium4-powered systems will be able to interoperate and extend their performance with Nvidia GPUs while still using Amazon's homegrown, lower-cost server rack technology. It's worth noting, too, that Nvidia's CUDA (Compute Unified Device Architecture), has become the de facto standard that all AI apps support. The Trainium4-powered systems may make it easier to woo big AI apps built with Nvidia GPUs in mind to Amazon's cloud. Amazon did not announce a timeline for Trainium4. If Amazon follows previous rollout timelines, we'll likely hear more about Trainium4 at next year's conference.
[3]
Amazon launches Trainium3 AI accelerator, competing directly against Blackwell Ultra in FP8 performance -- new Trn3 Gen2 UltraServer takes vertical scaling notes from Nvidia's playbook
Amazon Web Services this week introduced its next-generation Trainium3 accelerator for AI training and inference. As AWS puts it, the new processor is twice as fast as its predecessor and is four times more efficient. This makes it one of the best solutions for AI training and inference in terms of cost. In absolute numbers, Trainium3 offers up to 2,517 MXFP8 TFLOPS, which is nearly two times lower compared to Nvidia's Blackwell Ultra. However, AWS's Trn3 UltraServer packs 144 Trainium3 chips per rack, and offers 0.36 ExaFLOPS of FP8 performance, therefore matching the performance of Nvidia's NVL72 GB300. This is a very big deal, as very few companies can challenge Nvidia's rack-scale AI systems. The AWS Trainium3 is a dual-chiplet AI accelerator that is equipped with 144 GB of HBM3E memory using four stacks, which provides peak memory bandwidth of up to 4.9 TB/s. Each compute chiplet, allegedly made by TSMC using its 3nm-class fabrication process, contains four NeuronCore-v4 cores (which feature an extended ISA compared to predecessors) and connects two HBM3E memory stacks. The two chiplets are connected using a proprietary high-bandwidth interface and share 128 independent hardware data-movement engines (which are key for the Trainium architecture), collective communication cores that coordinate traffic between chips, and four NeuronLink-v4 interfaces for scale-out connectivity. A NeuronCore-v4 integrates four execution blocks: a tensor engine, a vector engine, a scalar engine, a GPSIMD block, and 32 MB of local SRAM that is explicitly managed by the compiler instead of being cache-controlled. From a software development standpoint, the core is built around a software-defined dataflow model in which data is staged into SRAM by DMA engines, processed by the execution units, and then written back as near-memory accumulation enables DMA to perform read-add-write operations in a single transaction. The SRAM is not coherent across cores and is used for tiling, staging, and accumulation rather than general caching. Perhaps the most interesting component of NeuronCore-v4 is the GPSIMD block, which integrates eight fully-programmable 512-bit vector processors that can execute general-purpose code written in C/C++ while accessing local SRAM. GPSIMD is integrated into NeuronCore because not everything in real AI models maps cleanly to a tensor engine. Modern AI workloads contain a lot of code for unusual data layouts, post-processing logic, indexing, and model-specific math. These are hard or inefficient to express as matrix operations, and running them on the host CPU would introduce latency and costly data transfers. GPSIMD solves this by providing real general-purpose programmable vector units inside the core, so such logic runs directly next to the tensors at full speed and using the same local SRAM. In short, NeuronCore-v4 operates as a tightly coupled dataflow engine in which tensor math, vector transforms, scalar control, and custom code all share a local 32MB scratchpad and are orchestrated by the Neuron compiler rather than by a warp scheduler used on Nvidia hardware. Performance-wise, Trainium3 outperforms its direct predecessor in FP8 compute (well, MXFP8) by almost two times and hits 2.517 PFLOPS per package (clearly ahead of Nvidia's H100/H200, but behind Blackwell B200/B300) and adds MXFP4 support. However, Trainium3's BF16, TF32, and FP32 performance remains on par with Trainium2, which clearly shows that AWS is betting on MXFP8 for training and inference going forward. To that end, it does not develop its BF16 (which is widely used for training nowadays) and FP32 capabilities, as it seems to feel comfortable with the performance it has, given that these formats are now used primarily for gradient accumulation, master weights, optimizer states, loss scaling, and some precision-sensitive operations. One interesting capability that Trainium3 has that is worth mentioning is the Logical NeuronCore Configuration (LNC) feature, which lets the Neuron compiler fuse four physical cores into a wider automatically synchronized logical core with combined compute, SRAM, and HBM, which could be useful for very wide layers or big sequence lengths that are common with very large AI models. Much of Nvidia's success in the recent quarters was driven by its rack-scale NVL72 solutions featuring 72 of its Blackwell GPUs. Supporting a massive scale-up world size and an all-to-all topology, which is especially important for Mixture-of-Experts (MoE) and autoregressive inference. This gives Nvidia a massive advantage over AMD and developers of custom accelerators, such as AWS. To enable this capability, Nvidia had to develop NVLink switches, sophisticated network cards, and DPUs, a massive silicon endeavor. However, it looks like AWS's Trn3 UltraServers will give Nvidia's GB300 NVL72 a run for its money. Trn3 UltraServers, powered by Trainium3 AI accelerators, will be offered in two sizes: one configuration packs 64 accelerators and presumably an Intel Xeon CPU, while the larger variant brings together 144 accelerators and an Arm-based Graviton in a single rack-scale solution. In the larger system, the 144 Trainium3 accelerators are distributed across 36 physical servers with one Graviton CPU and four Trainium3 chips installed in each machine. In many ways, such an arrangement resembles Nvidia's NVL72 approach, which uses Nvidia's CPU, GPU, and connectivity silicon, highlighting AWS' direction of building vertically integrated AI platforms. Within a server, Trainium3 accelerators are linked through a first NeuronSwitch-v1 layer using NeuronLink-v4 (at 2 GiB/s per device, though it is unclear whether we are talking about a single direction bandwidth, or aggregated bidirectional bandwidth), and communication between different servers is routed through two additional NeuronSwitch-v1 fabric layers, again carried over NeuronLink-v4. Unfortunately, AWS does not publish aggregate NeuronSwitch-v1 bandwidth across the domain. From a performance standpoint, the larger configuration with 144 Trainium3 delivers 362.5 MXFP8/MXFP4 PetaFLOPS (dense) performance, which (on par with GB300 NVL72), 96.624 PFLOPS of BF16/FP16/TF32 throughput, and 26.352 PFLOPS in FP32. The system is also equipped with 21 TB of HBM3E memory, featuring an aggregate memory bandwidth of 705.6 TB/s, leaving Nvidia's GB300 NVL72 behind in this metric. In general, Trn3 Gen2 UltraServer appears very competitive against Nvidia's GB300 NVL72 in terms of FP8 performance. FP8 is about to get more popular for training, so betting on this format makes a lot of sense. Of course, Nvidia has an ace up its sleeve in the form of NVFP4, which is positioned both for inference and training, and armed with this format, the company's Blackwell-based machines are unbeatable. The same applies to BF16, which got faster compared to Trainium2, but not enough to beat Nvidia's Blackwell. Overall, while the AWS Trn3 Gen2 UltraServer with 144 Trainium3 accelerator looks quite competitive when it comes to FP8 compared to Nvidia's Blackwell-based NVL72 machines, Nvidia's solution is more universal in general. In addition to rolling out new AI hardware, AWS announced a broad expansion of its AWS Neuron software stack at its annual re:Invent conference this week. AWS positions this release as a shift toward openness and developer accessibility, so the update promises to make Trainium platforms easier to adopt, let standard machine learning frameworks run directly on Trainium hardware, give users deeper control over performance, and even expose low-level optimization paths for experts. A major addition is native PyTorch integration through an open-source backend named TorchNeuron. Using PyTorch's PrivateUse1 mechanism, Trainium now appears as a native device type, which enables existing PyTorch code to execute without modification. TorchNeuron also supports interactive eager execution, torch.compile, and distributed features such as FSDP and DTensor, and it works with popular ecosystems including TorchTitan and Hugging Face Transformers. Access to this feature is currently restricted to select users as part of the private preview program. AWS also introduced an updated Neuron Kernel Interface (NKI) that gives developers direct control over hardware behavior, including instruction-level programming, explicit memory management, and fine-grained scheduling, exposing Trainium's instruction set to kernel developers. In addition, the company has released the NKI Compiler as open source under Apache 2.0. The programming interface is available publicly, while the compiler itself remains in limited preview. AWS also released its Neuron Explorer, a debugging and tuning toolkit that lets software developers and performance engineers improve how their models run on Trainium. This is done by tracing execution from high-level framework calls, all the way down to individual accelerator instructions, while offering layered profiling, source-level visibility, integration with development environments, and AI-guided suggestions for performance tuning. Finally, AWS introduced its Neuron Dynamic Resource Allocation (DRA) to integrate Trainium directly into Kubernetes without the need for custom schedulers. Neuron DRA relies on the native Kubernetes scheduler and adds hardware-topology awareness to enable complete UltraServers to be allocated as a single resource and then flexibly assign hardware for each workload. Neuron DRA supports Amazon EKS, SageMaker HyperPod, and UltraServer deployments, and is provided as open-source software with container images published in the AWS ECR public registry. Both Neuron Explorer and Neuron DRA are designed to simplify cluster management and give users fine-grained control over how Trainium resources are assigned and used. In a nutshell, AWS is moving closer to making its Trainium-based platforms much more ubiquitous than they are today, in an effort to make them more competitive against CUDA-based offerings from Nvidia. This week, Amazon Web Services released its 3 Generation Trainium accelerator for AI training and inference, as well as accompanying Trn3 UltraServers rack-scale solutions. For the first time, Trn3 Gen2 UltraServers rack-scale machines will rely solely on AWS in-house hardware, including CPU, AI accelerators, switching hardware, and connectivity fabrics, signalling that the company has adopted Nvidia's vertical integration hardware strategy. AWS claims that its Trainium3 processor offers roughly 2X higher performance and 4X better energy efficiency than Trainium2 as each accelerator delivers up to 2.517 PFLOPS (MXFP8) -- beating Nvidia's H100, but trailing B200 -- and is accompanied by 144 GB of HBM3E with 4.9 TB/s of bandwidth. Meanwhile, Trn3 Gen2 UltraServers scale to 144 accelerators for about 0.36 ExaFLOPS FP8 performance, which brings it on par with Nvidia's GB300 NVL72 rack-scale solution. Nonetheless, Nvidia's hardware still looks more universal than AWS's. To catch up with Nvidia, Amazon also announced major updates to its Neuron software stack to make Trainium-based platforms easier to use, allow standard machine-learning frameworks to run natively on the hardware, give developers greater control over performance, and open access to low-level tuning for experts.
[4]
AWS AI Factories: AI-in-a-box for enterprise datacenters
If sovereignty or on-prem AI matters, the new AI Factories could be for you re:invent Many businesses and government agencies require that all sensitive data stay on-premises for legal or security reasons. If those orgs want to work with AI, they can't rely on regular public clouds, but now they can let AWS build and manage AI hardware and software in their datacenters. Announced Tuesday at the company's re:Invent conference, AWS AI Factories is a fully managed solution where enterprises provide the datacenter and power while the house of Bezos installs its hardware and software under their roof. They operate like a private AWS Region, using customers' existing datacenter space, power, and network links while AWS brings in and manages its own racks of infrastructure, including compute, storage, database, and AI services, all running locally. Customers will not need to worry about acquiring hardware, installing it, or buying or building software platforms for their AI models. All they need is the physical space in a datacenter and enough power capacity to juice all those GPUs. That will save orgs a lot of time and expertise. Customers can use AWS tools like the Amazon Bedrock foundation model builder or SageMaker machine-learning platform, as well as some high-end hardware, such as the company's Trainium3 AI accelerators and Nvidia GPUs like the current-gen B200 and GB200, or next-gen GB300 and B300. It will use a petabit-scale, non-blocking network to connect the GPUs and offer Amazon FSx for Lustre and Amazon S3 Express One Zone storage technology. AWS does not yet support NVLink Fusion, which is a high-speed, chip-to-chip interconnect, but says support will arrive in its future Trainium4 chips. Amazon didn't disclose how much this will cost, so we don't yet know whether it will be more expensive than installing one's own hardware and software from scratch. "The AI factories operate exclusively for each customer and it helps them with that separation, maintaining the security and reliability you get from AWS while also meeting stringent compliance and sovereignty requirements," AWS CEO Matt Garman said in his keynote. Garman said the work was inspired by the service's efforts to spin up private, secure AI capabilities for the Kingdom of Saudi Arabia's AI endeavors, working with the ironically named AI platform company Humain. The company is helping build an "AI Zone" in the country, which will have up to 150,000 AI chips and dedicated AWS infrastructure. According to Garman, the AWS partnership provides Saudi Arabia with high performing infrastructure, models, and AI services like SageMaker and Bedrock, while meeting the kingdom's security, privacy, and responsible AI standards. "This type of work has sparked some interest in others: large government organizations in the public sector who are interested in a similar concept," Garman said. "We sat back and asked ourselves. Could we deliver this type of AI zone to a broader set of customers. Maybe even something that could leverage customers' existing datacenters?" In building its own AI factories, AWS faces some stiff competition. Dell's AI Factory with Nvidia was introduced in early 2024, and its promises of an edge to datacenter solution have captured billions in sales. In May, the company boasted 3,000 customers for its AI Factories, and last week, Dell said it had shipped $15.6 billion in AI servers year to date [PDF]. HPE's own private AI cloud product, which is also backed by Nvidia, won adoption by more than 300 new logos during the quarter ended July 31, it announced during its last earnings call [PDF]. Lenovo has also announced [PDF] a massive upswing in the sale of its infrastructure solutions, up 24 percent year over year in the quarter ended September 30, including "high double digit growth" of AI servers. Amazon's timing may be a bit of a mismatch, as polling from analysts at Forrester and Gartner indicates that the purse strings are tightening on AI spending as customers want a solid track to a return on their investment with these systems, which can cost millions of dollars to stand up, not to mention the price to power, maintain, and operate them. Naveen Chhabra, principal analyst at Forrester, speculated that AWS has likely been building this for several quarters, and suggested AWS is playing into the challenges that customers face when deploying AI infrastructure, including cooling, long lead times for products, the piecemeal approach to architecture, and data sovereignty. According to a November 21 report that Chhabra co-wrote, the revenue achieved as a result of AI spending is lagging and customers are taking notice. "With free cash flow tightening, interest rates still high, and even OpenAI's CEO warning of a dot-com-style bubble, the sector faces a reckoning," Forrester stated. The analyst group also posited that the looming chip shortage is causing limited and uncertain access to memory across devices, while pushing vendors to rethink their architecture. Chhabra sees many customers moving AI workloads to the cloud to combat these challenges, but that won't work if you are required to keep all of your data on-prem, which is the target market for AWS AI Factories. ®
[5]
Amazon announces new AI chips, closer Nvidia ties -- but it's cloud capacity that matters most
Amazon Web Services' two-track approach to artificial intelligence came into better focus Tuesday as the world's biggest cloud pushed forward with its own custom chips and got closer to Nvidia . During Amazon 's annual AWS Re:Invent 2025 conference in Las Vegas, Amazon Web Services CEO Matt Garman unveiled Trainium3 -- the latest version of the company's in-house custom chip. It has four times more compute performance, energy efficiency, and memory bandwidth than previous generations. AWS said that early results of customers testing Trainium3 are reducing AI training and inference costs by up to 50%. Custom chips, like Trainium, are becoming more and more popular for the big tech companies that can afford to make them. And, their use cases are broadening. For example, Google's tensor processing units (TPUs), co-designed by Broadcom , have also been getting a lot of attention since last month's launch of the well-received Gemini 3 artificial intelligence model. It is powered by TPUs. There was even a report that Meta Platforms was considering TPUs in addition to Nvidia 's graphics processing units (GPUs), which are the gold standard for all-purpose AI workloads. At the same time, Amazon also announced that it's deepening its work with Nvidia. In Tuesday's keynote, Garman introduced AWS Factories, which provides on-premise AI infrastructure for customers to use in their own data centers. The service combines Trainium accelerators and Nvidia graphics processing units, which allows customers to access Nvidia's accelerated computing platform, full-stack AI software, and GPU-accelerated applications. By offering both options, Amazon aims to keep accelerating AWS cloud capacity and, in turn, revenue growth to stay on top during a time of intense competition from Microsoft 's Azure and Alphabet 's Google Cloud, the second and third place horses in the AI race, by revenue. Earlier this year, investors were concerned when second-quarter AWS revenue growth did not live up to its closest competitors. In late October's release of Q3 results, Amazon went a long way to putting those worries to rest. Amazon CEO Andy Jassy said at the time , "AWS is growing at a pace we haven't seen since 2022, re-accelerating to 20.2% YoY." He added, "We've been focused on accelerating capacity -- adding more than 3.8 gigawatts (GW) in the past 12 months." Tuesday's announcements come at a pivotal time for AWS as it tries to rapidly expand its computing capacity after a year of supply constraints that put a lid on cloud growth. As great as more efficient chips are, they don't make up for the capacity demand that the company is facing as AI adoption ramps up, which is why adding more gigawatts of capacity is what Wall Street is laser-focused on. Fortunately, Wall Street argues that the capacity headwind should flip to a tailwind. Wells Fargo said Trainium3 is "critical to supplementing Nvidia GPUs and CPUs in this capacity build" to close the gap with rivals. In a note to investors on Monday, the analysts estimate Amazon will add more than 12 gigawatts of compute by year-end 2027, boosting total AWS capacity to support as much as $150 billion in incremental annual AWS revenue if demand remains strong. In a separate note, Oppenheimer said Monday that AWS has already proven its ability to improve capacity, which has already doubled since 2022. Amazon plans to double it again by 2027. The analysts said that such an expansion could translate to 14% upside to 2026 AWS revenue and 22% upside in 2027. Analysts said each incremental gigawatt of compute added in recent quarters translated to roughly $3 billion of annual cloud revenue. Bottom line While new chips are welcome news that helps AWS step deeper into the AI chip race, Amazon's investment in capacity and when that capacity will be unlocked is what investors are more locked in on because that's how it will fulfill demand. The issue is not a demand issue; it's a supply issue. We are confident in AWS' ability to add the capacity. In fact, there's no one company in the world that could deal with this kind of logistics problem, at this scale, better than Amazon. Amazon shares surged nearly 14% to $254 each in the two sessions following the cloud and e-commerce giant's late Oct. 30 earnings print. The stock has since given back those gains and then some. As of Tuesday's close, shares were up 6.5% year to date, a laggard among its "Magnificent Seven" peers, and underperforming the S & P 500 's roughly 16% advance in 2025. (Jim Cramer's Charitable Trust is long AMZN, NVDA. See here for a full list of the stocks.) As a subscriber to the CNBC Investing Club with Jim Cramer, you will receive a trade alert before Jim makes a trade. Jim waits 45 minutes after sending a trade alert before buying or selling a stock in his charitable trust's portfolio. If Jim has talked about a stock on CNBC TV, he waits 72 hours after issuing the trade alert before executing the trade. THE ABOVE INVESTING CLUB INFORMATION IS SUBJECT TO OUR TERMS AND CONDITIONS AND PRIVACY POLICY , TOGETHER WITH OUR DISCLAIMER . NO FIDUCIARY OBLIGATION OR DUTY EXISTS, OR IS CREATED, BY VIRTUE OF YOUR RECEIPT OF ANY INFORMATION PROVIDED IN CONNECTION WITH THE INVESTING CLUB. NO SPECIFIC OUTCOME OR PROFIT IS GUARANTEED.
[6]
Amazon to fuse Nvidia's NVLink into Trainium4 accelerators
Meanwhile, Trainium3 makes its debut promising million-chip training clusters Re:Invent Amazon says that its next generation of homegrown silicon will deliver 6x higher performance thanks to a little help from its buddy Nvidia. At its Re:Invent convention in Las Vegas on Tuesday, Amazon Web Services (AWS) teased its Trainium4 accelerators, which will be among the first to embrace Nvidia's NVLink Fusion interconnect tech for chip-to-chip communications. NVLink is a high-speed interconnect that allows multiple GPUs spanning multiple systems to pool resources and behave like a single accelerator. Previously, this technology has been limited to Nvidia CPUs and GPUs, but back in May, the AI infrastructure giant announced it was opening the tech to others with the introduction of NVLink Fusion at Computex. Amazon claims that the technology will allow its Trainium4 accelerators, Graviton CPUs, and EFA networking tech to communicate seamlessly across Nvidia's MGX racks. In its current form, Nvidia's fifth-gen NVLink fabrics support up to 1.8 TB/s of bandwidth (900 GB/s in each direction) per GPU, but the company is on track to double that to 3.6 TB/s by next year. Beyond Nvidia's interconnect tech, details are somewhat vague. We're told that the new chips will deliver 3x more FLOPS at FP8, 6x the performance at FP4, and 4x the memory bandwidth. Whether those claims pertain to the individual chips or its UltraServer rack systems, Amazon hasn't said. Assuming it's the rack systems, as was the case with Trainium3, that suggests AWS's Trainium4 UltraServers could deliver upwards of 2 exaFLOPS of dense FP4 performance and 2.8 petabytes a second of memory bandwidth. That latter point is likely to be a major boon for bandwidth-bound inference workloads. Despite a rather confusing naming convention, AWS actually employs Trainium for both internal and external training and inference. Of course, the devil is in the details and we simply don't have all of them yet. Amazon made similar claims about its Trainium3 UltraServers this time last year, boasting a 4.4x uplift in compute over its Trainium2 racks. But while technically true, what we didn't know at the time was roughly half that performance would be achieved by more than doubling the number of chips from 64 to 144. Speaking of Trainium3, a year after first teasing the chips, Amazon is finally ready to bring its third generation of Trainium accelerators to the general market. According to AWS, each chip is equipped with 144 GB of HBM3E memory, good for 4.9 TB/s of memory bandwidth, and is capable of churning out just over 2.5 petaFLOPS of dense FP8 performance. However, for jobs that benefit from sparsity, like training, the chips are even more potent. Trainium3 features 16:4 structured sparsity, which effectively quadruples the chip's performance to 10 petaFLOPS for supported workloads. Amazon's Trainium3 UltraServers cram 144 of these chips connected in an all-to-all fabric using its NeuronSwitch-v1 interconnect tech, which Amazon says offers twice the chip-to-chip bandwidth. This is a marked change from Amazon's Trainium2 UltraServers, which featured 64 accelerators arranged in a 4x4x4 3D torus topology. Amazon declined to comment on how the 144 Trainium3 accelerators are connected to one another, but if we had to guess, it likely resembles the flat switched topology used in Nvidia's NVL72 or AMD's Helios rack systems. Such a move should ease the transition to NVLink Fusion next generation, but leaves Google as one of the few chip designers left using mesh topologies in large-scale AI training and inference clusters. In any case, Amazon seems confident that its new interconnect tech and EFA networking will enable it to support production deployments containing up to a million accelerators, compared to the 500,000 Trainium2 chips found in Project Rainier. Combined, each Trainium3 UltraServer features 20.7 TB of HBM3E, 706 TB/s of memory bandwidth, and between 363 and 1,452 petaFLOPS depending on whether your workload actually benefits from sparsity or not. This puts the systems roughly on par with Nvidia's latest Blackwell Ultra-based GB300 NVL72 systems - at least at FP8. At FP4, the gap widens considerably with the Nvidia system delivering more than 3x the performance. With that said, FP4 is still primarily used in inference, while higher-precision datatypes like BF16 and FP8 are preferred for training. Despite Trainium's advancements in performance, some customers still aren't ready to abandon Nvidia just yet. Because of this, Amazon has also announced the availability of new compute offerings based on Nvidia's GB300 NVL72, which join the company's existing GB200 instances. ®
[7]
Amazon introduces new Trainium3 chip offering 4x AI performance
AWS says these customers reported stronger inference performance, faster iteration, and reduced billable compute hours. AWS also offered a first look at Trainium4. The company has not announced a release window, but it confirmed that Trainium4 will deliver another large step in performance. The most notable shift is compatibility with NVIDIA NVLink Fusion, a high-speed chip-to-chip interconnect that supports large, tightly coupled AI compute clusters. This direction suggests AWS is no longer trying to position its hardware as a strict alternative to NVIDIA chips. Instead, it is building a hybrid ecosystem where AWS silicon can plug into, extend, or complement NVIDIA GPU systems. This could make it easier for companies building on CUDA, which remains the industry standard for AI development, to adopt AWS custom hardware without rebuilding their software stack. Alongside the chip announcement, AWS and NVIDIA expanded their strategic partnership. The collaboration covers interconnect technology, datacenter infrastructure, open model support, and rack-scale AI system deployment.
[8]
AWS wants to be a part of Nvidia's "AI Factories" - and it could change everything about how your business treats AI
Collaboration including some of the latest hardware around AWS has announced it will be teaming up with Nvidia to develop and build so-called "AI Factories" - hyper-powered facilities bringing together some of the most powerful hardware around to power the next generation of AI. Revealed at AWS re:Invent 2025, the news will see the two tech giants work together on AI Factories going forward, including Nvidia AI hardware and AWS' latest Trainium chips, networking tools, storage and database technology. Bringing all this together will help organizations and governments alike develop AI Factory technology to scale up their workloads and take the next step forward. AWS says this new approach offers companies a much more streamlined and effective way to scale and develop their AI projects, which can involve huge expense and amounts of time and other resources. Instead, AI Factories will deploy dedicated AWS AI infrastructure directly into the customer's data center, ensuring high customization but also security and oversight - in effect creating something like a private AWS region for the customer, the company says. In addition, AWS and Nvidia's partnership will now mean customers can build and run LLMs faster at scale, giving AWS customers access to the full stack of Nvidia's AI software. This includes the latest NVIDIA Grace Blackwell and the next-generation NVIDIA Vera Rubin platforms, with support for Nvidia's NVLink Fusion high-speed chip interconnect technology coming soon to next generation Trainium4 chips. "Large-scale AI requires a full-stack approach -- from advanced GPUs and networking to software and services that optimize every layer of the data center. Together with AWS, we're delivering all of this directly into customers' environments," said Ian Buck, vice president and general manager of Hyperscale and HPC at Nvidia. "By combining NVIDIA's latest Grace Blackwell and Vera Rubin architectures with AWS's secure, high-performance infrastructure and AI software stack, AWS AI Factories allow organizations to stand up powerful AI capabilities in a fraction of the time and focus entirely on innovation instead of integration."
[9]
Amazon unveils new AI chip in battle against Nvidia
San Francisco (United States) (AFP) - Amazon Web Services launched its in-house-built Trainium3 AI chip on Tuesday, marking a significant push to compete with Nvidia in the lucrative market for artificial intelligence computing power. The move intensifies competition in the AI chip market, where Nvidia currently dominates with an estimated 80- to 90-percent market share for products used in training large language models that power the likes of ChatGPT. Google last week caused tremors in the industry when it was reported that Facebook-parent Meta would employ Google AI chips in data centers, signaling new competition for Nvidia, currently the world's most valuable company and a bellwether for the AI investment frenzy. This followed the release of Google's latest AI model last month that was trained using the company's own in-house chips, not Nvidia's. AWS, which will make the technology available to its cloud computing clients, said its new chip is lower cost than rivals and delivers over four times the computing performance of its predecessor while using 40 percent less energy. "Trainium3 offers the industry's best price performance for large scale AI training and inference," AWS CEO Matt Garman said at a launch event in Las Vegas. Inference is the execution phase of AI, where the model stops scouring the internet for training and starts performing tasks in real-world scenarios. Energy consumption is one of the major concerns about the AI revolution, with major tech companies having to scale back or pause their net-zero emissions commitments as they race to keep up on the technology. AWS said its chip can reduce the cost of training and operating AI models by up to 50 percent compared with systems that use equivalent graphics processing units, or GPUs, mainly from Nvidia. "Training cutting-edge models now requires infrastructure investments that only a handful of organizations can afford," AWS said, positioning Trainium3 as a way to democratize access to high-powered AI computing. AWS said several companies are already using the technology, including Anthropic, maker of the Claude AI assistant and a competitor to ChatGPT-maker OpenAI. AWS also announced it is already developing Trainium4, expected to deliver at least three times the performance of Trainium3 for standard AI workloads. The next-generation chip will support Nvidia's technology, allowing it to work alongside that company's servers and hardware. Amazon's in-house chip development reflects a broader trend among cloud providers seeking to reduce dependence on external suppliers while offering customers more cost-effective alternatives for AI workloads. Nvidia puzzled industry observers last week when it responded to Google's successes in an unusual post on X, saying the company was "delighted" by the competition before adding that Nvidia "is a generation ahead of the industry."
[10]
AWS brings sovereign AI on-prem with new AI Factories alongside Trainium3 and Nvidia GB300 launches - SiliconANGLE
AWS brings sovereign AI on-prem with new AI Factories alongside Trainium3 and Nvidia GB300 launches Amazon Web Services Inc. today announced a set of artificial intelligence infrastructure announcements spanning sovereign on-premises deployments, next-generation custom AI accelerators and the most advanced Nvidia Corp. GPU instances yet offered on AWS -- all part of a push to dominate both cloud and private AI at large scale. The announcements included the launch of AWS AI Factories, the general availability of Amazon EC2 Trn3 UltraServers powered by the new Trainium3 chip and the introduction of P6e-GB300 UltraServers featuring Nvidia's latest Blackwell-based GB300 NVL72 platform. Leading the announcements is AWS AI Factories, a new offering that delivers dedicated, full-stack AWS AI infrastructure directly inside customers' existing data centers. The platform combines Nvidia accelerated computing, AWS Trainium chips, high-speed low-latency networking, energy-efficient infrastructure and core AWS AI services, including Amazon Bedrock and Amazon SageMaker. AWS AI Factories have been built primarily for governments and regulated industries and operate similarly to a private AWS Region to provide secure, low-latency access to compute, storage and AI services while ensuring strict data sovereignty and regulatory compliance. With the offering, customers can leverage their own facilities, power and network connectivity, while AWS handles deployment, operations and lifecycle management. AWS says that the result accelerates deployment timelines that would normally take years. As part of the AI Factories announcement, AWS also highlighted its deepening partnership with Nvidia around the platform, including support for Grace Blackwell and future Vera Rubin GPU architectures and future support for Nvidia NVLink Fusion interconnects in Trainium4. "Large-scale AI requires a full-stack approach -- from advanced GPUs and networking to software and services that optimize every layer of the data center," said Ian Buck, vice president and general manager of Hyperscale and HPC at Nvidia. "Together with AWS, we're delivering all of this directly into customers' environments." AWS also announced that its Amazon EC2 Trn3 UltraServers, powered by the new three-nanometer Trainium3 AI chip, are now generally available. Trn3 systems can scale up to 144 Trainium3 chips in a single UltraServer to deliver up to 4.4 times more compute performance, four times greater energy efficiency and nearly four times more memory bandwidth than Trainium2. The UltraServers are designed for next-generation workloads such as agentic AI, mixture-of-experts models and large-scale reinforcement learning, with AWS-engineered networking that delivers sub-10-microsecond chip-to-chip latency. In testing using OpenAI Group PBC's open-weight model GPT-OSS, AWS customers achieved three times higher throughput per chip and four times faster inference response times versus the previous generation. Customers including Anthropic PBC, Karakuri Ltd., Metagenomi Inc., Neto.ai Inc., Ricoh Company Ltd. and Splash Music Inc. are already reporting up to 50% reductions in training and inference costs. AWS also previewed Trainium4, which is expected to deliver major gains in FP4 and FP8 performance and memory bandwidth. Rounding out the AI infrastructure announcements, AWS introduced the new P6e-GB300 UltraServers, featuring Nvidia's GB300 NVL72 platform, making it the most advanced Nvidia GPU architecture available in Amazon EC2. The instances deliver the highest GPU memory and compute density on AWS, targeting trillion-parameter AI inference and advanced reasoning models in production. The P6e-GB300 systems run on the AWS Nitro System and integrate tightly with services such as Amazon Elastic Kubernetes Service and, in doing so, allow customers to deploy large-scale inference workloads securely and efficiently.
[11]
AWS Launches Trainium3 UltraServers, Gives a Peek Into Trainium4 | AIM
At re: Invent 2025, AWS announced the general availability of its new Amazon EC2 Trn3 UltraServers, powered by the Trainium3 chip built on 3nm technology, to help customers train and deploy AI models faster and at lower cost. The company said the new servers deliver up to 4.4x more compute performance, 4x greater energy efficiency, and almost 4x more memory bandwidth compared to the previous Trainium2 generation. Each UltraServer can scale up to 144 Trainium3 chips, offering as much as 362 FP8 petaflops of compute. Trainium3 follows AWS's earlier deployment of 500,000 Trainium2 chips in Project Rainier, created with Anthropic and described as the world's largest AI compute cluster. AWS also revealed early details of Trainium4, expected to deliver at least 6x the processing performance in FP4, along with higher FP8 performance and memory bandwidth. The next-generation chip will support NVIDIA NVLink Fusion interconnects to operate alongside NVIDIA GPUs and AWS Graviton processors in MGX racks. AWS has already deployed more than 1 million Trainium chips to date. The company says the latest performance improvements translate to faster training and lower inference latency. In internal tests using OpenAI's GPT-OSS open-weight model, Trn3 UltraServers delivered three times higher throughput per chip and four times faster response times compared to Trn2 UltraServers. Companies including Anthropic, Karakuri, Metagenomi, NetoAI, Ricoh and Splash Music are already reporting reduced training and inference costs up to 50% in some cases. AWS said its Bedrock service is already running production workloads on Trainium3. Decart, which focuses on real-time generative video, said it has achieved 4x faster frame generation at half the cost of GPUs on Trainium3. AWS noted that such capabilities could support large-scale interactive applications. The UltraServers are supported by an upgraded networking stack, including the new NeuronSwitch-v1, which provides twice the internal bandwidth, and a revised Neuron Fabric that brings inter-chip latency below 10 microseconds. The company said this reduces bottlenecks in distributed training and inference, especially for workloads such as agentic systems, mixture-of-experts architectures and reinforcement learning. UltraClusters 3.0 can connect thousands of the new servers, scaling to as many as one million Trainium chips -- 10 times the previous generation. AWS said this level of scale enables training multimodal models on trillion-token datasets and serving millions of concurrent users.
[12]
Nvidia and AWS expand partnership with specialized AI hardware and software - SiliconANGLE
Nvidia and AWS expand partnership with specialized AI hardware and software Nvidia Corp. and Amazon Web Services Inc. announced the expansion of the two companies' collaboration on new chip technology, networking, cloud infrastructure and open models and physical AI. Fueling scale-up in infrastructure and custom silicon, AWS said at its re:Invent conference today that it would support Nvidia's NVLink Fusion, a custom central processing unit and accelerator designed for scaling artificial intelligence data centers. It will be used in deploying custom silicon, including AWS's upcoming Tranium4 chips for AI inference and model training and Graviton central processing units. "GPU compute demand is skyrocketing -- more compute makes smarter AI, smarter AI drives broader use and broader use creates demand for even more compute," said Nvidia founder and Chief Executive Jensen Huang. "The virtuous cycle of AI has arrived." In conjunction, AWS is expanding its accelerating computing offerings with Nvidia Blackwell architecture, including Nvidia HGX B300 and GB300 NVL72 graphics processing units. The company said these GPUs will be added to the AWS infrastructure backbone for AI Factories, a new AI cloud offering for customers worldwide, providing secure, regionally sovereign AI infrastructure for globally situated companies. For the public sector, AI factories will help transform the federal supercomputing and AI landscape with a unified architecture. With these global datacenters, AWS plans to provide access to advanced AI services and capabilities to deploy and train massive models while maintaining absolute control of proprietary data. The partnership also expands the integration of Nvidia software with the AWS AI ecosystem. Nvidia Nemotron open models will now be available on Amazon Bedrock, the company's fully managed service providing access to a large number of foundation models. Developers can now create generative AI applications and agents using Nemotron Nano 2 and Nemotron Nano 2 VL to build specialized agent-based AI applications capable of processing text, code, images, and videos at scale. The two companies will also work together to co-engineer the software layer to accelerate data ingestion and processing for enterprise companies by combining technologies. Amazon OpenSearch Service, a managed, scalable search and analytics service, will now offer serverless GPU acceleration for vector index building, powered by Nvidia cuVS, an open-source library for GPU-accelerated vector search and data clustering. Production-ready agentic AI will gain from combining Strands Agents for agent development, Nvidia NeMo Agent Toolkit for deep profiling and performance tuning and Amazon Bedrock AgentCore providing scalable agent infrastructure. Advancing AI-powered robotics requires high-quality and diverse datasets for training foundation models for physical AI, as well as frameworks for testing and validating them in simulation before deploying to the real world. Physical AI refers to artificial intelligence systems and models designed to interact with the real world through sensing, reasoning and acting through physical machines. These machines can include robots, self-driving cars, smart buildings and intelligent assistants that can interact with the physical world. Nvidia Cosmos, world foundation models used to simulate the real world virtually for training and the production of synthetic data that's difficult to gather. The platform speeds up the process of turning small amounts of visual data into large training sets for a wide variety of scenarios. Cosmos world foundation models are now available as Nvidia NIM microservices on Amazon EKS, the company's managed Kubernetes service. This will enable real-time robotics control and simulation workloads in the cloud. The platform also includes models that comprehend real-world physics, object interactions and motion, enabling reasoning about complex situations and predicting outcomes. This capability allows for the development of AI agents that can perform tasks with a deeper understanding of the real world.
[13]
ETtech Explainer: Amazon's latest Tranium chip and what it means for the race - The Economic Times
Amazon Web Services has launched Trainium3, its new AI chip, offering faster performance, more memory, and lower energy use. The next-generation Trainium4 will add Nvidia's NVLink Fusion for even bigger, faster AI servers. These chips give AWS more control, cut costs, and strengthen its position in the cloud AI market.Amazon Web Services (AWS) has spent years creating its own AI training hardware, and its newest release, Trainium3, is touted to perform far better than earlier versions. At its annual tech conference, re:Invent 2025, AWS introduced the Trainium3 UltraServer. This machine runs on the company's advanced 3-nanometre Trainium3 chip and uses AWS's own networking systems to support large-scale AI workloads. What's special about Trainium3? AWS says the new system is over four times faster and offers four times more memory. It is built for both training and running demanding AI applications. Thousands of UltraServers can be connected to provide an application with as many as one million Trainium3 chips, ten times more than before. Each UltraServer contains 144 chips. The company also says the chips use around 40% less energy than the previous generation. As global data-centre power use continues to climb, AWS is aiming to build systems that need less electricity, not more. "AWS Trainium is a family of purpose-built AI accelerators -- Trn1, Trn2, and Trn3 -- designed to deliver scalable performance and cost efficiency for training and inference across a broad range of generative AI workloads," the company said in a blog post. Customers such as Anthropic, Japan's LLM Karakuri, SplashMusic, and Decart are already using Trainium3. According to TechCrunch, they have seen notable reductions in inference costs. Next up: Trainium4 At AWS' conference, the company also gave a first look at its next AI chip, Trainium4, which is already in development. AWS said the new chip will use a technology called "NVLink Fusion". This allows very fast communication between different chips and is one of Nvidia's key innovations. Nvidia has been encouraging other chip makers to use NVLink, and AWS now joins Intel and Qualcomm in adopting it. Trainium4 is expected to deliver a big performance boost and support NVLink Fusion, letting the system work smoothly with Nvidia GPUs. At the same time, it will continue to use Amazon's more affordable server rack design. The technology will also help AWS build larger AI servers that can communicate efficiently, which is important when training huge AI models. Through this partnership, customers will gain access to AWS's 'AI Factories', private AI infrastructure in their own data centres, for faster performance and readiness. However, Amazon has not announced a release date for Trainium4. If it follows previous patterns, more details may appear at next year's conference. What this means These new chips could make AWS a more attractive option for large businesses seeking to train and deploy advanced models. Early tests suggest Trainium3 can cut costs by up to 50% compared to traditional GPU-based systems, making large-scale AI more affordable. Nvidia has long dominated top-tier AI hardware, but Trainium3 gives Amazon a strong alternative that is cheaper while still delivering high performance. By developing its own chips, AWS reduces its reliance on external supplies, such as Nvidia GPUs, for major AI projects. This, in turn, means that Amazon gains more control over pricing and availability and can offer customers a lower-cost option without sacrificing capability, strengthening its position in the cloud AI market. Investors responded quickly as well -- the rise in Amazon's share price reflects confidence in its AI strategy and expectations of continued cloud growth. If Trainium3 sees wide adoption, analysts believe it will support stronger profits and reinforce Amazon's reputation as a leading technology company.
[14]
Amazon's AI Chip Trainium2 Hits Multi-Billion Dollar Revenue Run Rate As CEO Andy Jassy Unveils 4X Faster Successor - NVIDIA (NASDAQ:NVDA), Amazon.com (NASDAQ:AMZN)
Amazon.com Inc. (NASDAQ:AMZN) CEO Andy Jassy revealed more insights on the next generation of the company's AI chip, Trainium3, at the AWS re:Invent conference on Wednesday. Four-Times Faster, Lower Power Chip Trainium3, Amazon's AI chip that competes with Nvidia Corp (NASDAQ:NVDA), is four times faster and consumes less power than its predecessor, Trainium2. The CEO also took to X to share some details about the Trainium2, which he says is already "a multi-billion-dollar revenue run-rate business" with over a million chips in production and over 100,000 companies using it as their primary Bedrock usage. Bedrock is Amazon's AI app development tool that allows companies to select from various AI models. "Trainium3 will deliver at least 4.4x more compute performance, 4x greater energy efficiency, and almost 4x more memory bandwidth than Trainium2," wrote Jassy Jassy credits Trainium2's momentum to its superior price-performance compared to other GPUs, saying it delivers better results at a lower cost -- reflecting Amazon's broader strategy of pricing its in-house technology more competitively. Additionally, Jassy spoke about Nova Forge, which lets companies train earlier versions of frontier models with their own proprietary data, creating customized "Novellas" tailored to their needs. He also mentioned that AWS expects autonomous agents to become the primary way businesses gain value from AI, adding tools like the Kiro agent, DevOps Agent, and Security Agent to its lineup. See Also: Mark Cuban Says Obamacare As It Stands Today Is 'Garbage,' Billionaire Says The Blame Lies At The Doorstep Of Politicians Anthropic Key Revenue Contributor Amazon's AI push has been gaining momentum, with the Trainium2 chips playing a pivotal role. The Trainium2 chips were instrumental in the launch of Project Rainier, Amazon's supercomputer, which went live in October. The new Rainier system positioned Amazon Web Services to capture a larger share of the booming AI training and inference demand, potentially adding billions in revenue growth through 2026. AWS CEO Matt Garman told CRN that the company's AI chip has been particularly successful with its cloud customers, with a significant portion of the billions in revenue coming from Anthropic, a startup backed by Amazon. Anthropic uses over 500,000 Trainium2 chips in Project Rainier. Moreover, Amazon's partnership with Nvidia has also been crucial to its AI chip success. At the AWS re:Invent, Amazon and Nvidia deepened their long-running AI partnership by integrating Nvidia's accelerated compute technology with AWS's custom silicon and global cloud infrastructure. This strategy aims to provide enterprises with faster, more efficient options for training and deploying advanced AI systems. According to Benzinga Edge Stock Rankings, Amazon has a growth score of 90.60% and a momentum rating of 52.20%. Click here to see how it compares to other leading tech companies. Price Target: On a year-to-date basis, Amazon stock climbed 4.04% as per data from Benzinga Pro. On Thursday, it fell 1.41% to close at $229.11. READ NEXT: Amazon Tests 30-Minute Delivery: Here's Why Investors Are Paying Attention Image via Shutterstock Disclaimer: This content was partially produced with the help of AI tools and was reviewed and published by Benzinga editors. AMZNAmazon.com Inc$230.050.41%OverviewNVDANVIDIA Corp$184.400.56%Market News and Data brought to you by Benzinga APIs
[15]
AWS Tranium3 AI Is 'The Best Inference Platform In The World,' CEO Says
Amazon Web Services CEO Matt Garman says the company's newly launched Trainium3 AI accelerator chips are the "best inference platform in the world," while its AI openness strategy is helping to win enterprise customer market share. "We make it so if you want to pull in Gemini models from Google, you can use those with our AgentCore," said Garman in an interview with CRN. "We think that innovation happens in lots of different places, not just inside the walls of Amazon. We want our customers to be able to benefit from that ecosystem of innovation," said Garman. "It is a difference in philosophies of how we operate, and our competitors don't always operate that same way." [Related: AWS' Nova 2 AI Models Launched At re:Invent 2025 As CEO Touts New Innovation ] Thousands of AWS customers, developers and partners flocked to Las Vegas this week to attend AWS re:Invent 2025. The Seattle-based $132 billion cloud giant launched a slew of innovation at the conference, including the general availably of its new Trainium3 chips for AI training and interfering. "Trainium3 is actually going to be the best inference platform in the world," Garman told CRN. AWS sees Bedrock with Trainium3 as the world's leading inference engine and a business that will one day possibly be as large as Amazon EC2. Garman and his team expect Trainium3 to be a major hit with enterprise customers as it will provide better price-performance options. "[Trainium3] will be the most efficient, most effective, the best cost performance, the lowest latency and the best throughput," Garman said. In an interview with CRN, Garman takes a deep dive into AWS' new Trainium3 AI chips, "supercharging" its agentic AI development offering Kiro, and why AWS' AI openness strategy will win versus competitors like Microsoft and Google Cloud.
[16]
Amazon Is 'All-In
Amazon has ramped up the ASIC race by showcasing Trainium3 server configurations and next-gen Trainium4 ASICs, bringing massive performance and efficiency gains. Amazon's Trainium3 UltraServers & Next-Gen Trainium4 Chips Showcases the Firm's 'All-In' Commitment in the AI Race The race of custom silicon from Big Tech is indeed reaching newer levels, as we are seeing massive advancements from firms like Google, Meta, and now Amazon. At the AWS re: Invent 2025, Amazon has given an insight into what customers should expect from the firm in the realm of ASICs, and one of the bigger developments from the company is the introduction of Trainium3 UltraServers, which basically an AI system scaling up into 144 chips in a single cluster, which has brought in "up to 4.4x more compute performance, 4x greater energy efficiency, and almost 4x more memory bandwidth" compared to the previous generation. Trn3 UltraServers pack up to 144 Trainium3 chips into a single integrated system, delivering up to 4.4x more compute performance than Trainium2 UltraServers. This allows you to tackle AI projects that were previously impractical or too expensive by training models faster, cutting time from months to weeks, serving more inference requests from users simultaneously, and reducing both time-to-market and operational costs. The Trainium3 UltraServers server features the newer NeuronSwitch-v1 technology, which debuts with upgraded bandwidth and fabric networking. It's an alternative to NVIDIA's NVLink solution, offered by Amazon. Instead, here, the idea is to interconnect Trainium ASICs into a massive 1 million-chip cluster, which is claimed to bring the ability to train "trillion-token datasets" for inferencing capabilities. The UltraServers are an indicator that ASIC manufacturers are aggressively expanding their compute portfolio, given the compute constraints companies are facing. Amazon has also given us a look into next-gen Trainium4 ASICs, which are said to feature 6x higher FP4 performance and a massive increase in memory specifications. More importantly, Trainium4 will now support NVIDIA's NVLink technology as well, which means that customers looking to scale up their existing infrastructure by adding the Trainium stack, combined with Team Green's compute portfolio, can do so easily. Amazon has reported massive 'external interest' around its custom AI chips, with companies like Anthropic reporting reduced training costs. It appears that Amazon is 'all in' when it comes to the race for ASICs, and following Google's recent TPU announcements, it seems the retail giant is not holding back when it comes to advancing its compute portfolio. Follow Wccftech on Google to get more of our news coverage in your feeds.
[17]
Amazon Is Expanding Its AI Chip Ambitions. Should Nvidia Investors Be Worried? | The Motley Fool
Amazon (AMZN 0.85%) is a leading artificial intelligence company that incorporates AI into its vast e-commerce and advertising platforms, as well as being the world's largest cloud computing company. However, it is now taking a key step in expanding its AI empire by rolling out a new AI chip that could significantly challenge the dominant position held by chipmaker Nvidia (NVDA 0.35%). Amazon's Trainium3 chip is the latest movement in the company's efforts to scale up its custom AI hardware offerings. The company unveiled the chip Dec. 2 at its annual re:Invent conference in Las Vegas. "Trainium already represents a multibillion-dollar business today and continues to grow really rapidly," Amazon Web Services CEO Matt Garman said. Amazon has compelling reasons to develop its own chips -- Nvidia's powerful graphics processing units (GPUs) are state-of-the-art, handling both the training and inference of the most advanced AI applications. But they're also extremely expensive. The Blackwell chips are reportedly priced between $30,000 and $40,000 each, and companies must cluster thousands of them in data centers to run AI programs. The Trainium3 chips can handle some AI tasks at lower prices. Dave Brown, a vice president at Amazon Web Services, told Yahoo! Finance that developers can save 30% to 40% by using Amazon chips instead of Nvidia's. And of course, the more work that Amazon does with its in-house chips, the less money it will need to spend with Nvidia. Amazon accounts for 7.5% of Nvidia's revenue, Bloomberg reports. On its own, probably not. Amazon won't completely stop buying Nvidia products, and Nvidia has no shortage of customers that it can replace Amazon with, if needed. Nvidia CEO Jensen Huang has said that the company sold out of cloud GPUs and that its Blackwell sales are "off the charts." Revenue in the company's fiscal third quarter of 2026 (ending Oct. 26, 2025) was $57 billion, up 62% from a year ago. The company also reported data center revenue of $51.2 billion, representing a 66% increase from the same period in the previous year. Nvidia's guidance calls for revenue this fiscal year of $212.8 billion, followed by fiscal 2027 revenue of $316 billion as it begins selling its next-generation Rubin architecture. The company recently announced a deal with OpenAI, the maker of ChatGPT, for 10 gigawatts of computing power, and has also recently secured deals with Anthropic, Intel, Palantir Technologies, Alphabet, Microsoft, Oracle, and xAI. But the market's response to Amazon's new chip is worth watching -- especially in the wake of Meta Platforms' reported negotiations to buy data center chips from Alphabet's Google. It was only a matter of time before Nvidia started facing growing competition from some of its biggest customers, and it will be up to Huang's leadership team to prove to customers that its GPUs are worth the premium price.
[18]
AWS says custom AI chip line becomes multibillion-dollar business, unveils plans for Trainium 4 - The Economic Times
Las Vegas: Amazon Web Services' custom AI chip line, Trainium, is now a multibillion-dollar business, with its Trainium 3 UltraServers available and work underway on the next-generation chip, Trainium 4. Speaking at the company's annual event, AWS re:Invent 2025 in Las Vegas, CEO Matt Garman said that Trainium is already a multibillion-dollar business, and that 1 million Trainium chips have been deployed so far. Trainium is AWS's in-house chip used for both training and inference, positioning the company to compete with Nvidia, the market leader in AI training. Customers using Trainium 3 include Anthropic, Karakuri, Metagenomics, Neto.ai, Ricoh, and Splashmusic. According to AWS, Trainium can reduce the cost of training and inference by up to 50%. In addition, the company is doubling down on AI agents. Garman said that the advent of agents represents a key inflection point, with the true value of AI still waiting to be unlocked. He added that reaching the age of agents requires reimagining every process -- and that demands powerful AI at the lowest possible cost. AWS's work on AI chips is a step in this direction. The company has also partnered with Nvidia, and says it is expanding its compute portfolio with P6e-GB300 UltraServers, featuring Nvidia's most advanced GPU architecture on AWS Cloud. The company has also expanded Nova, AWS's suite of AI models, introducing four new models focused on reasoning, multimodal processing, conversational AI, and code generation. (The reporter is attending AWS re:Invent 2025 in Las Vegas at the invitation of AWS)
[19]
Amazon to use Nvidia tech in AI chips, roll out new servers
Amazon's AWS cloud computing unit on Tuesday said it will adopt key Nvidia technology in future generations of its artificial intelligence computing chips as the firm ramps up efforts to get major AI customers using its services. AWS, or Amazon Web Services, said it will adopt a technology called "NVLink Fusion" in a future chip, with no specified release date, known as Trainium4. The NVLink technology creates speedy connections between different kinds of chips and is one of Nvidia's crown jewels. The companies made the announcement as part of AWS' annual week-long cloud computing conference in Las Vegas, which draws some 60,000 people. Amazon is expected to also show off new versions of its Nova AI model, initially unveiled last year. Nvidia has been pushing to get other chip firms to adopt its NVLink technology, with Intel, Qualcomm and now AWS on board. The technology will help AWS build bigger AI servers that can recognize and communicate with one another faster, a critical factor in training large AI models, in which thousands of machines must be strung together. As part of the Nvidia partnership, customers will have access to what AWS is calling AI Factories, exclusive AI infrastructure inside their own data centers for greater speed and readiness. "Together, Nvidia and AWS are creating the compute fabric for the AI industrial revolution - bringing advanced AI to every company, in every country, and accelerating the world's path to intelligence," Nvidia CEO Jensen Huang said in a statement. Separately, Amazon said it is rolling out new servers based on a chip called Trainium3. The new servers, available on Tuesday, each contain 144 chips and have more than four times more computing power than AWS' previous generation of AI, while using 40% less power, Dave Brown, vice president of AWS compute and machine learning services, told Reuters. Brown did not give absolute figures on power or performance, but said AWS aims to compete with rivals - including Nvidia - based on price. "We've got to prove to them that we have a product that gives them the performance that they need and get a right price point so they get that price-performance benefit," Brown said. "That means that they can say, 'Hey, yeah, that's the chip I want to go and use.'"
[20]
Amazon to use Nvidia tech in AI chips, roll out new servers
LAS VEGAS, Dec 2 (Reuters) - Amazon.com's AWS cloud computing unit on Tuesday said it will adopt key Nvidia technology in future generations of its artificial intelligence computing chips as the firm ramps up efforts to get major AI customers using its services. AWS, or Amazon Web Services, said it will adopt a technology called "NVLink Fusion" in a future chip, with no specified release date, known as Trainium4. The NVLink technology creates speedy connections between different kinds of chips and is one of Nvidia's crown jewels. The companies made the announcement as part of AWS' annual week-long cloud computing conference in Las Vegas, which draws some 60,000 people. Amazon is expected to also show off new versions of its Nova AI model, initially unveiled last year. Nvidia has been pushing to get other chip firms to adopt its NVLink technology, with Intel, Qualcomm and now AWS on board. The technology will help AWS build bigger AI servers that can recognize and communicate with one another faster, a critical factor in training large AI models, in which thousands of machines must be strung together. As part of the Nvidia partnership, customers will have access to what AWS is calling AI Factories, exclusive AI infrastructure inside their own data centers for greater speed and readiness. "Together, Nvidia and AWS are creating the compute fabric for the AI industrial revolution - bringing advanced AI to every company, in every country, and accelerating the world's path to intelligence," Nvidia CEO Jensen Huang said in a statement. Separately, Amazon said it is rolling out new servers based on a chip called Trainium3. The new servers, available on Tuesday, each contain 144 chips and have more than four times more computing power than AWS' previous generation of AI, while using 40% less power, Dave Brown, vice president of AWS compute and machine learning services, told Reuters. Brown did not give absolute figures on power or performance, but said AWS aims to compete with rivals - including Nvidia - based on price. "We've got to prove to them that we have a product that gives them the performance that they need and get a right price point so they get that price-performance benefit," Brown said. "That means that they can say, 'Hey, yeah, that's the chip I want to go and use.'" (Reporting by Stephen Nellis in San Francisco and Greg Bensinger in Las Vegas; Editing by Paul Simao)
[21]
Trainium 3 explained: Amazon's new AI chip and its NVIDIA-ready roadmap
AWS unveils Trainium 3 to deliver faster, more efficient frontier model training Amazon's newest AI chip arrives at a moment when the global demand for compute is rising faster than the hardware ecosystem can keep up. Trainium 3 is not just a faster successor. It is the centerpiece of a strategy that aims to make AWS a core destination for training frontier-scale models while reducing the industry's overreliance on GPUs. The announcement also teased something even more consequential: a roadmap that brings Amazon's hardware closer to Nvidia's world instead of competing against it from the sidelines. Also read: ChatGPT Ads: Sam Altman's dangerous road to boost OpenAI profits, will it work? Model sizes are ballooning, data pipelines are scaling, and training runs now stretch into millions of GPU-hours. For most companies, access to the required hardware is the single biggest bottleneck. AWS wants to close the gap with a chip built specifically for AI training workloads, not adapted from general computing tasks. Trainium 3 is manufactured on a 3-nanometer process and delivers up to four times the performance of its predecessor while using significantly less power. In practice, this means faster iteration cycles for anyone building large models and lower energy costs for organizations running long multistage training jobs. AWS also introduced the UltraServer, a dense system that houses 144 of these chips and can be linked with thousands of others in massive clusters. This kind of scale is designed to support everything from enterprise models to experimental systems that push the limits of today's AI research. AWS has tried for years to establish itself as a viable alternative to Nvidia hardware, but the market reality is clear. Developers are deeply tied to GPU-optimized frameworks, toolchains, and workflows. Replacing Nvidia outright is neither easy nor realistic. With Trainium 3 and the roadmap behind it, AWS is shifting toward a hybrid approach. The next generation, Trainium 4, will support Nvidia's high-speed NVLink Fusion interconnect. That matters because it enables mixed clusters where Trainium chips and Nvidia GPUs work together instead of in separate environments. It also reduces the friction for teams that want to explore non-GPU accelerators but aren't ready to overhaul their entire stack. Compatibility becomes a bridge, not a threat. Also read: Better than VAR? FIFA World Cup 2026 will have more accurate tech This move positions AWS differently in the AI infrastructure race. It signals that the company understands the importance of interoperability and wants to attract developers by meeting them halfway. Rather than building a walled garden, AWS is trying to expand the range of hardware choices for customers who want performance, flexibility, and lower costs. For cloud buyers, this opens up practical advantages. Workloads tuned for GPUs can continue running on familiar infrastructure, while exploratory or large-scale training tasks can shift to Trainium-based clusters that promise better efficiency. For enterprises, it offers a way to scale without fighting for scarce GPUs or paying premium prices in secondary markets. If Trainium 3 delivers on its claims, it could push other cloud providers to invest more aggressively in custom silicon. It also intensifies competition around energy efficiency, a metric that will be central as AI growth collides with sustainability concerns. More significantly, the Nvidia-friendly roadmap hints at a future where cloud platforms become modular hardware ecosystems rather than single-vendor silos. The AI industry has spent years chasing raw power. The next phase will value flexibility just as much and AWS is betting that customers want both. Trainium 3 is the hardware expression of that bet, and Trainium 4's Nvidia compatibility shows how AWS intends to win developers without forcing them to abandon what already works. At a time when every major player is trying to secure its place in the AI supply chain, Amazon's newest chip positions AWS not as a challenger on the outskirts, but as a platform aiming to sit at the center of how frontier models are built.
Share
Share
Copy Link
AWS introduced AI Factories, a fully managed on-premises AI solution built with Nvidia, alongside its new Trainium3 chip that delivers four times more performance than its predecessor. The announcements signal Amazon's dual strategy to accelerate cloud computing capacity while addressing data sovereignty concerns for enterprises and governments requiring local AI infrastructure.
Amazon Web Services announced AWS AI Factories at its re:Invent 2025 conference, a fully managed solution that allows corporations and governments to run AI systems within their own data centers . The service addresses data sovereignty concerns by keeping sensitive information on-premises while AWS handles hardware installation and management. Customers provide the physical space and power, while AWS supplies the AI infrastructure, including compute, storage, and database services that operate like a private AWS Region
4
.This collaboration with Nvidia enables customers to choose between Nvidia Blackwell GPUs or Amazon's new Trainium3 AI chip, combined with AWS networking, storage, and security tools . The service integrates with Amazon Bedrock for model selection and SageMaker for AI training and inference, providing a comprehensive platform for AI workloads without requiring customers to acquire or install hardware themselves
4
. AWS CEO Matt Garman revealed the concept originated from work with Saudi Arabia's Humain to build an "AI Zone" featuring up to 150,000 AI chips4
.
Source: CRN
AWS formally launched its Trainium3 UltraServer, powered by state-of-the-art 3-nanometer Trainium3 chip technology
2
. The third-generation system delivers four times more compute performance, memory bandwidth, and energy efficiency compared to previous generations5
. Each UltraServer hosts 144 chips, and thousands can be linked together to provide up to 1 million Trainium3 chips—ten times the capacity of the previous generation2
.
Source: Digit
The Trainium3 accelerator features dual chiplets equipped with 144 GB of HBM3E memory and peak bandwidth of 4.9 TB/s
3
. With FP8 performance reaching 2,517 MXFP8 TFLOPS, the Trn3 UltraServer packs 0.36 ExaFLOPS of FP8 performance across its 144-chip configuration, matching Nvidia's NVL72 GB300 rack-scale AI systems3
. Early customers including Anthropic, Japan's Karakuri, and Decart have already reduced AI training and inference costs by up to 50% using the new accelerators2
5
.AWS teased its next-generation Trainium4 chip, already in development, which will support NVLink Fusion—Nvidia's high-speed chip interconnect technology
2
. This integration will allow Trainium4-powered systems to interoperate with Nvidia GPUs while leveraging Amazon's lower-cost server rack technology2
. The move could make it easier to attract AI applications built with Nvidia's CUDA platform, which has become the de facto standard for AI workloads2
.Source: Market Screener
Related Stories
While AWS AI innovations capture attention, Wall Street analysts emphasize that cloud computing capacity expansion matters most for AWS revenue growth
5
. AWS has added more than 3.8 gigawatts in the past 12 months and plans to add over 12 gigawatts by year-end 2027, potentially supporting up to $150 billion in incremental annual AWS revenue if demand remains strong, according to Wells Fargo analysts5
. Each incremental gigawatt translates to roughly $3 billion of annual cloud revenue5
.AWS re-accelerated to 20.2% year-over-year growth in Q3, the fastest pace since 2022, as the company addresses supply constraints that limited earlier expansion
5
. The dual approach of custom AI chips and on-premises AI infrastructure aims to maintain AWS's lead over Microsoft Azure and Google Cloud during intense competition for AI adoption5
.AWS faces stiff competition in the on-premises AI market. Dell's AI Factory with Nvidia captured 3,000 customers and shipped $15.6 billion in AI servers year to date, while HPE's private AI cloud won over 300 new customers
4
. Forrester analysts note that AI spending faces scrutiny as customers demand clear returns on investment, with free cash flow tightening and warnings of a potential dot-com-style bubble4
. Microsoft has also deployed its own Nvidia AI Factories for OpenAI workloads and outlined data centers addressing data sovereignty in local countries . The trend toward enterprise datacenters and hybrid clouds represents an ironic shift for major cloud providers, reminiscent of infrastructure strategies from 2009 .Summarized by
Navi
[2]
[4]
1
Technology

2
Technology

3
Technology
