Amazon Web Services Partners with Cerebras to Deliver Faster AI Inference Using Combined AI Chips

5 Sources

Share

Amazon Web Services announced a partnership with Cerebras Systems to combine their AI chips in a new service launching in the second half of 2026. The collaboration pairs AWS Trainium 3 processors with Cerebras' Wafer Scale Engine to accelerate AI inference computing, particularly for chatbots and coding tools. This marks the first time a major hyperscaler has committed to using Cerebras technology.

AWS and Cerebras Join Forces to Enhance AI Inference Computing

Amazon Web Services has struck a deal with Cerebras Systems to offer Cerebras AI chips on Amazon's cloud, marking a significant shift in the AI computing infrastructure landscape. The Cerebras Amazon partnership will combine AWS Trainium chips with Cerebras' innovative technology to deliver what the companies claim will be faster AI inference for LLMs than currently available solutions

1

. AWS, the largest provider of cloud computing power, plans to begin offering this new service in the second half of 2026, though financial terms remain undisclosed

1

.

Source: Market Screener

Source: Market Screener

For Cerebras, valued at $23.1 billion and planning an initial public offering, securing Amazon as its first hyperscaler customer represents a major validation of its unique chip design

2

. The startup has already made waves earlier this year by signing a $10 billion deal to supply AI chips to OpenAI, the creator of ChatGPT

3

. According to Cerebras CEO Andrew Feldman, the partnership will "bring the fastest inference to a global customer base," leveraging AWS's reach across individual developers to the largest banks in the world

2

.

Source: Bloomberg

Source: Bloomberg

How the Divide and Conquer Strategy Works

The technical approach behind this collaboration centers on AI inference—the process where trained AI systems respond to user queries. The companies will employ what Feldman describes as a divide and conquer strategy, splitting inference tasks into prefill and decode stages

2

. Trainium 3 processors will handle the prefill phase, transforming user requests from human language into tokens that AI systems understand. Meanwhile, Cerebras' Wafer Scale Engine will take over the decode stage, generating the actual responses users seek

1

.

This approach to inference disaggregation typically faces a significant challenge: communication between different components can slow down processing. However, the partnership aims to overcome this drawback by using specialized AI chips that can more responsively handle inference tasks

1

. The improvement will be especially noticeable in applications requiring back-and-forth interaction, such as chatbots and coding tools that work in multiple stages. AWS Vice President Nafea Bshara noted that while a Trainium-only service will likely remain cheaper, the combined chip offering will appeal to use cases "where time is money"

1

.

Source: Reuters

Source: Reuters

Positioning to Compete with Nvidia

The collaboration arrives as the AI chip market intensifies, with multiple players seeking to compete with Nvidia, which dominates the graphics processing unit (GPU) market. Cerebras has pioneered a fundamentally different approach with its massive chips that don't rely on expensive high-bandwidth memory like Nvidia's flagship chips do

2

. The Cerebras CS-3 systems can handle massive amounts of data in one go, representing a unique architecture in the AI hardware landscape

4

.

Interestingly, analysts expect Nvidia to unveil a similar strategy next week, detailing how it plans to combine its own GPU chips with those from Groq, a startup Nvidia acquired for $17 billion in late December

5

. When asked about comparisons, Amazon stated it couldn't make detailed assessments of Nvidia's unrevealed offering but emphasized that its Trainium 3 program is "just months away from running production workloads" while the timeline for Nvidia-Groq pairing remains unclear . Amazon expects both Trainium 3 and future Trainium 4 chips to "continue to lead in price-performance versus merchant GPUs"

2

.

Access Through Amazon Bedrock and Data Centers

Cerebras chips will be deployed inside Amazon Web Services data centers, connected to Amazon's Trainium 3 custom AI chips through proprietary networking technology

3

. Customers will access these capabilities through Amazon Bedrock starting in the next couple of months, with AWS being the first cloud provider to offer Cerebras' specialized hardware for disaggregated inference

4

. Later this year, AWS plans to add support for Amazon Nova and other open-source models using this infrastructure

4

.

According to AWS Vice President David Brown, "The result will be inference that's an order of magnitude faster and higher performance than what's available today"

4

. The partnership addresses the voracious demand for AI computing infrastructure, with the two companies having prepared for this collaboration for several years

1

. AWS will deploy as many of the chips as demand requires, signaling confidence in market appetite for this combined solution. While Amazon remains a major Nvidia customer, this partnership demonstrates its commitment to improving data center economics and providing unique services through its own chip designs and strategic collaborations

1

.

Today's Top Stories

TheOutpost.ai

Your Daily Dose of Curated AI News

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

© 2026 Triveous Technologies Private Limited
Instagram logo
LinkedIn logo