Curated by THEOUTPOST
On Wed, 12 Mar, 12:08 AM UTC
4 Sources
[1]
Cerebras just announced 6 new AI datacenters that process 40M tokens per second -- and it could be bad news for Nvidia
Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Cerebras Systems, an AI hardware startup that has been steadily challenging Nvidia's dominance in the artificial intelligence market, announced Tuesday a significant expansion of its data center footprint and two major enterprise partnerships that position the company to become the leading provider of high-speed AI inference services. The company will add six new AI data centers across North America and Europe, increasing its inference capacity twentyfold to over 40 million tokens per second. The expansion includes facilities in Dallas, Minneapolis, Oklahoma City, Montreal, New York, and France, with 85% of the total capacity located in the United States. "This year, our goal is to truly satisfy all the demand and all the new demand we expect will come online as a result of new models like Llama 4 and new DeepSeek models," said James Wang, Director of Product Marketing at Cerebras, in an interview with VentureBeat. "This is our huge growth initiative this year to satisfy almost unlimited demand we're seeing across the board for inference tokens." The data center expansion represents the company's ambitious bet that the market for high-speed AI inference -- the process where trained AI models generate outputs for real-world applications -- will grow dramatically as companies seek faster alternatives to GPU-based solutions from Nvidia. Strategic partnerships that bring high-speed AI to developers and financial analysts Alongside the infrastructure expansion, Cerebras announced partnerships with Hugging Face, the popular AI developer platform, and AlphaSense, a market intelligence platform widely used in the financial services industry. The Hugging Face integration will allow its five million developers to access Cerebras Inference with a single click, without having to sign up for Cerebras separately. This represents a major distribution channel for Cerebras, particularly for developers working with open-source models like Llama 3.3 70B. "Hugging Face is kind of the GitHub of AI and the center of all open source AI development," Wang explained. "The integration is super nice and native. You just appear in their inference providers list. You just check the box and then you can use Cerebras right away." The AlphaSense partnership represents a significant enterprise customer win, with the financial intelligence platform switching from what Wang described as a "global, top three closed-source AI model vendor" to Cerebras. The company, which serves approximately 85% of Fortune 100 companies, is using Cerebras to accelerate its AI-powered search capabilities for market intelligence. "This is a tremendous customer win and a very large contract for us," Wang said. "We speed them up by 10x so what used to take five seconds or longer, basically become instant on Cerebras." How Cerebras is winning the race for AI inference speed as reasoning models slow down Cerebras has been positioning itself as a specialist in high-speed inference, claiming its Wafer-Scale Engine (WSE-3) processor can run AI models 10 to 70 times faster than GPU-based solutions. This speed advantage has become increasingly valuable as AI models evolve toward more complex reasoning capabilities. "If you listen to Jensen's remarks, reasoning is the next big thing, even according to Nvidia," Wang said, referring to Nvidia CEO Jensen Huang. "But what he's not telling you is that reasoning makes the whole thing run 10 times slower because the model has to think and generate a bunch of internal monologue before it gives you the final answer." This slowdown creates an opportunity for Cerebras, whose specialized hardware is designed to accelerate these more complex AI workloads. The company has already secured high-profile customers including Perplexity AI and Mistral AI, who use Cerebras to power their AI search and assistant products, respectively. "We help Perplexity become the world's fastest AI search engine. This just isn't possible otherwise," Wang said. "We help Mistral achieve the same feat. Now they have a reason for people to subscribe to Le Chat Pro, whereas before, your model is probably not the same cutting-edge level as GPT-4." The compelling economics behind Cerebras' challenge to OpenAI and Nvidia Cerebras is betting that the combination of speed and cost will make its inference services attractive even to companies already using leading models like GPT-4. Wang pointed out that Meta's Llama 3.3 70B, an open-source model that Cerebras has optimized for its hardware, now scores the same on intelligence tests as OpenAI's GPT-4, while costing significantly less to run. "Anyone who is using GPT-4 today can just move to Llama 3.3 70B as a drop-in replacement," he explained. "The price for GPT-4 is [about] $4.40 in blended terms. And Llama 3.3 is like 60 cents. We're about 60 cents, right? So you reduce cost by almost an order of magnitude. And if you use Cerebras, you increase speed by another order of magnitude." Inside Cerebras' tornado-proof data centers built for AI resilience The company is making substantial investments in resilient infrastructure as part of its expansion. Its Oklahoma City facility, scheduled to come online in June 2025, is designed to withstand extreme weather events. "Oklahoma, as you know, is a kind of a tornado zone. So this data center actually is rated and designed to be fully resistant to tornadoes and seismic activity," Wang said. "It will withstand the strongest tornado ever recorded on record. If that thing just goes through, this thing will just keep sending Llama tokens to developers." The Oklahoma City facility, operated in partnership with Scale Datacenter, will house over 300 Cerebras CS-3 systems and features triple redundant power stations and custom water-cooling solutions specifically designed for Cerebras' wafer-scale systems. From skepticism to market leadership: How Cerebras is proving its value The expansion and partnerships announced today represent a significant milestone for Cerebras, which has been working to prove itself in an AI hardware market dominated by Nvidia. "I think what was reasonable skepticism about customer uptake, maybe when we first launched, I think that is now fully put to bed, just given the diversity of logos we have," Wang said. The company is targeting three specific areas where fast inference provides the most value: real-time voice and video processing, reasoning models, and coding applications. "Coding is one of these kind of in-between reasoning and regular Q&A that takes maybe 30 seconds to a minute to generate all the code," Wang explained. "Speed directly is proportional to developer productivity. So having speed there matters." By focusing on high-speed inference rather than competing across all AI workloads, Cerebras has found a niche where it can claim leadership over even the largest cloud providers. "Nobody generally competes against AWS and Azure on their scale. We don't obviously reach full scale like them, but to be able to replicate a key segment... on the high-speed inference front, we will have more capacity than them," Wang said. Why Cerebras' US-centric expansion matters for AI sovereignty and future workloads The expansion comes at a time when the AI industry is increasingly focused on inference capabilities, as companies move from experimenting with generative AI to deploying it in production applications where speed and cost-efficiency are critical. With 85% of its inference capacity located in the United States, Cerebras is also positioning itself as a key player in advancing domestic AI infrastructure at a time when technological sovereignty has become a national priority. "Cerebras is turbocharging the future of U.S. AI leadership with unmatched performance, scale and efficiency - these new global datacenters will serve as the backbone for the next wave of AI innovation," said Dhiraj Mallick, COO of Cerebras Systems, in the company's announcement. As reasoning models like DeepSeek R1 and OpenAI's o3 become more prevalent, the demand for faster inference solutions is likely to grow. These models, which can take minutes to generate answers on traditional hardware, operate near-instantaneously on Cerebras systems, according to the company. For technical decision makers evaluating AI infrastructure options, Cerebras' expansion represents a significant new alternative to GPU-based solutions, particularly for applications where response time is critical to user experience. Whether the company can truly challenge Nvidia's dominance in the broader AI hardware market remains to be seen, but its focus on high-speed inference and substantial infrastructure investment demonstrates a clear strategy to carve out a valuable segment of the rapidly evolving AI landscape.
[2]
Cerebras vs. Nvidia: The AI hardware battle just got personal
Cerebras Systems, an AI hardware startup, announced the expansion of its data center footprint with six new AI data centers in North America and Europe, significantly increasing its inference capacity to over 40 million tokens per second. This announcement came on Tuesday and positions the company to compete more aggressively against Nvidia in the artificial intelligence market. The new facilities will be established in Dallas, Minneapolis, Oklahoma City, Montreal, New York, and France, with 85% of the total capacity based in the United States. According to James Wang, director of product marketing at Cerebras, the company's aim this year is to meet the expected surge in demand for inference tokens driven by new AI models like Llama 4 and DeepSeek. The inference capacity will increase from 2 million to over 40 million tokens per second by Q4 2025 across the planned eight data centers. Wang highlighted that this strategic expansion is a critical part of the company's initiative to deliver high-speed AI inference services, traditionally dominated by Nvidia's GPU-based solutions. In addition to the infrastructure expansion, Cerebras announced partnerships with Hugging Face, a prominent AI developer platform, and AlphaSense, a market intelligence platform. The integration with Hugging Face will enable its five million developers to access Cerebras Inference seamlessly, facilitating the usage of open-source models like Llama 3.3 70B. Wang described Hugging Face as the "GitHub of AI," noting that the integration allows developers to activate Cerebras services with a single click. The partnership with AlphaSense marks a significant transition for the financial intelligence platform, which is shifting from a leading AI model vendor to using Cerebras to enhance its AI-driven market intelligence search capabilities, reportedly speeding up processing times by tenfold. Meta just tested its own AI chip: Is Nvidia in trouble? Cerebras aims to establish itself as a leader in high-speed AI inference by leveraging its Wafer-Scale Engine (WSE-3) processor, which is said to outperform GPU solutions by a factor of 10 to 70. This performance boost is particularly pertinent as AI models incorporate more complex reasoning capabilities, which generally result in slower processing times. Wang indicated that although Nvidia acknowledges the significance of reasoning models, these typically require longer computation times. Cerebras has already partnered with companies like Perplexity AI and Mistral AI, utilizing its technology to enhance their AI search engines and assistants, respectively. The Cerebras hardware reportedly achieves inference speeds up to 13 times faster than conventional GPU-based solutions across multiple AI models, including Llama 3.3 70B and DeepSeek-R1 70B. The economic advantages of Cerebras' offering are also significant. Wang stated that Meta's Llama 3.3 70B model, optimized for Cerebras systems, delivers performance comparable to OpenAI's GPT-4 while incurring significantly lower operational costs. He noted that the operational cost for GPT-4 is approximately $4.40, while for Llama 3.3 it is around 60 cents, representing a potential cost reduction by nearly an order of magnitude when adopting Cerebras technology. Cerebras also plans to invest in disaster-resistant infrastructure for its expansion, with a new facility in Oklahoma City designed to withstand extreme weather conditions. Scheduled to open in June 2025, this facility will feature over 300 Cerebras CS-3 systems, triple-redundant power stations, and specialized water-cooling systems to support its hardware. This expansion reflects Cerebras' strategy to maintain a competitive edge in an AI hardware market largely dominated by Nvidia. Wang asserts that the skepticism surrounding the company's customer uptake has been dispelled by the diversity of its client base, as it aims to enhance performance in sectors including voice and video processing, reasoning models, and coding applications. With 85% of its capacity located in the U.S., Cerebras is positioning itself as a significant contributor to domestic AI infrastructure, a priority as discussions around technological sovereignty grow. Dhiraj Mallick, COO of Cerebras Systems, stated that the new data centers will play a vital role in the evolution of AI innovation.
[3]
Cerebras to deploy AI DCs in North America, France in 2025
Plus, startup's inference service makes debut on Hugging Face Cerebras has begun deploying more than a thousand of its dinner-plate sized-accelerators across North America and parts of France as the startup looks to establish itself as one of the largest and fastest suppliers of AI inference services. The expansion, confirmed at the HumanX AI conference in Las Vegas, will see Cerebras - by the end of this year - bring online new datacenters in Texas, Minnesota, Oklahoma, and Georgia, along with its first facilities in Montreal, Canada, and France. Of these facilities, Cerebras will maintain full ownership of the Oklahoma City and Montreal sites, while the remainder are jointly operated under an agreement with Emirati financier G42 Cloud. The largest of the US facilities will be located in Minneapolis, Minnesota, and will feature 512 of its CS-3 AI accelerators totaling 64 exaFLOPS of FP16 compute, when it comes online in the second quarter of 2025. Unlike many of the large-scale AI supercomputers and datacenter buildouts announced over the past year, Cerebras's will be powered by its in-house accelerators. Announced a year ago this week, Cerebras's CS-3 systems feature a wafer-scale processor measuring 46,225 mm, which contains four trillion transistors spread across 900,000 cores and 44 GB of SRAM. Next to the hundreds of thousands of GPUs hyperscalers and cloud providers are already deploying, a thousand-plus CS-3s might not sound like that much compute until you realize each is capable of producing 125 petaFLOPS of highly sparse FP16 performance compared to just 2 petaFLOPS on an H100 or H200 and 5 petaFLOPS on Nvidia's most powerful Blackwell GPUs. When the CS-3 made its debut, Cerebras was still focused exclusively on model training. However, since then the company has expanded its offering to inference. The company claims it can serve Llama 3.1 70B at up to 2,100 tokens a second. This is possible, in part, because large language model (LLM) inferencing is primarily memory-bound, and while a single CS-3 doesn't offer much in terms of capacity, it makes up for that in memory bandwidth, which peaks at 21 petabytes per second. An H100, for reference, offers nearly twice the memory capacity, but just 3.35 TBps of memory bandwidth. However, this alone only gets Cerebras to around 450 tokens a second. As we've previously discussed, the remaining performance is achieved via a technique called speculative decoding, which uses a small draft model to generate the initial output, while a larger model acts as a fact-checker in order to preserve accuracy. So long as the draft model doesn't make too many mistakes, the performance improvement can be dramatic, up to a 6x increase in tokens per second, according to some reports. Amid a sea of GPU bit barns peddling managed inference services, Cerebras is leaning heavily on its accelerator's massive bandwidth advantage and experience with speculative decoding to differentiate itself, especially as "reasoning" models like DeepSeek-R1 and QwQ become more common. Because these models rely on chain-of-thought reasoning, a response could potentially require thousands of tokens of "thought" to reach a final answer depending on its complexity. So the faster you can churn out tokens, the less time folks are left waiting for a response, and, presumably, the more folks are willing to pay for the privilege. Of course, with just 44 GB of memory per accelerator, supporting larger models remains Cerebras's sore spot. Llama 3.3 70B, for instance, requires at least four of Cerebras's CS-3s to run at full 16-bit precision. A model like Llama 3.1 405B - which Cerebras has demoed - would need more than 20 to run with a meaningful context size. As fast as Cerebras's SRAM might be, the company is still some way from serving up multi-trillion-parameter scale models at anything close to the speeds they're advertising. With that said, the speed of Cerebras's inference service has already helped it win contracts with Mistral AI and, most recently, Perplexity. This week, the company announced yet another customer win with market intelligence platform AlphaSense, which, we're told, plans to swap three closed source model providers for an open model running on Cerebras's CS-3s. Finally, as part of its infrastructure buildout, Cerebras aims to extend API access to its accelerators to more developers through an agreement with model repo Hugging Face. Cerebras's inference service is now available as part of Hugging Face's Inference Providers line-up, which provides access to a variety of inference-as-a-service providers, including SambaNova, TogetherAI, Replicate, and others, via a common interface and API. ®
[4]
Cerebras announces six new AI accelerator data centers across North America and France - SiliconANGLE
Cerebras announces six new AI accelerator data centers across North America and France Ambitious artificial intelligence startup Cerebras Systems Inc. today said it has begun deploying its wafer-scale AI accelerator chips across six new cloud data centers in North America and France to provide ultrafast AI inference. The company also announced a new partnership with Hugging Face Inc., a hub best known for hosting open-source machine learning and AI models, which will bring the company's inference platform to Hugging Face Hub. Cerebras is best known for its specialized architecture that runs on dinner plate-sized silicon wafers and high-performance computing, or HPC systems. This allows the company to achieve an inference service, which enables it to serve models such as Meta Platform Inc.'s Llama 3.3 70B at over 2,000 tokens per second. "Cerebras is turbocharging the future of U.S. AI leadership with unmatched performance, scale and efficiency - these new global data centers will serve as the backbone for the next wave of AI innovation," said Dhiraj Mallick, chief operating officer of Cerebras Systems. Launched in August 2024, the company's AI inference service swiftly gained traction with major AI clients. Notable customers include Mistral AI, a leading French startup that offers the AI assistant and chatbot Le Chat and the AI-powered search engine Perplexity AI Inc. The company is expanding by launching the new data centers in Texas, Minnesota, Oklahoma and Georgia, along with campuses in Montreal, Canada and France. Cerebras stated that it will retain full ownership of the facilities in Oklahoma City and Montreal. The other centers will be operated in partnership with G42, a strategic partner. As demand for reasoning models such as OpenAI's o3 and DeepSeek R1 continue to increase, the need for faster inference will follow. These models use a "chain of thought" technique to solve complex problems by breaking them down into smaller, logical steps to reach the solution and display their "thinking" as they go along. This also means that models can take minutes to find a final solution, but using Cerebras inference these models can execute deep reasoning in seconds. A new partnership between Hugging Face and Cerebras will bring high-speed AI inference to millions of developers around the world. Cerebras Inference is capable of running the industry's most popular models at more than 2,000 tokens per second. The company said this is more than 70x faster than comparable cloud-based solutions that use Nvidia Inc.'s most powerful graphics processing units. The opportunity to use this new service without needing to go to an outside party from directly within Hugging Face will from the click of a button will make it easier for developers to experiment with models and build their own solutions faster. This is especially important as agentic AI becomes the norm, a type of AI that can take action and achieve goals without human supervision. AI agents "reason" through complex tasks, use external tools and sift through data to complete goals and this type of problem-solving requires a lot of AI computing power. "By making Cerebras Inference available through Hugging Face, we're empowering developers to work faster and more efficiently with open-source AI models, unleashing the potential for even greater innovation across industries," said Andrew Feldman, chief executive of Cerebras. Developers can turn on Cerebras Inference when using Hugging Face Hub by selecting "Cerebras" as their provider on the platform for any open-source model when using the inference application programming interface.
Share
Share
Copy Link
Cerebras Systems announces a significant expansion of its AI infrastructure, adding six new data centers across North America and Europe. The company aims to increase its inference capacity to over 40 million tokens per second, potentially disrupting Nvidia's stronghold in the AI hardware market.
Cerebras Systems, an AI hardware startup, has unveiled plans for a significant expansion of its data center footprint, positioning itself as a formidable challenger to Nvidia's dominance in the artificial intelligence market. The company will add six new AI data centers across North America and Europe, dramatically increasing its inference capacity from 2 million to over 40 million tokens per second by Q4 2025 12.
The new facilities will be established in Dallas, Minneapolis, Oklahoma City, Montreal, New York, and France, with 85% of the total capacity based in the United States 1. Cerebras will maintain full ownership of the Oklahoma City and Montreal sites, while the remaining facilities will be jointly operated under an agreement with Emirati financier G42 Cloud 3.
James Wang, Director of Product Marketing at Cerebras, emphasized the company's goal to meet the surging demand for inference tokens driven by new AI models like Llama 4 and DeepSeek 1. This expansion is coupled with strategic partnerships, including:
Cerebras is leveraging its Wafer-Scale Engine (WSE-3) processor, which claims to outperform GPU solutions by a factor of 10 to 70 2. The company's CS-3 systems feature a wafer-scale processor measuring 46,225 mm², containing four trillion transistors across 900,000 cores and 44 GB of SRAM 3.
Wang highlighted the economic advantages of Cerebras' offering:
"Anyone who is using GPT-4 today can just move to Llama 3.70B as a drop-in replacement," he explained. "The price for GPT-4 is [about] $4.40 in blended terms. And Llama 3.70B is like 60 cents. We're about 60 cents, right? So you reduce cost by almost an order of magnitude. And if you use Cerebras, you increase speed by another order of magnitude." 1
The company is investing in disaster-resistant infrastructure, with its Oklahoma City facility designed to withstand extreme weather conditions, including the strongest recorded tornadoes 12. This focus on resilience underscores Cerebras' commitment to maintaining uninterrupted AI services.
Cerebras is positioning itself as a significant contributor to domestic AI infrastructure, with 85% of its capacity located in the U.S. 2. The company has already secured high-profile customers, including Perplexity AI and Mistral AI, who use Cerebras to power their AI search and assistant products 14.
As the demand for reasoning models like OpenAI's o3 and DeepSeek R1 continues to increase, Cerebras aims to capitalize on the need for faster inference. The company claims its service can execute deep reasoning in seconds, compared to minutes on other platforms 4.
Cerebras Systems' ambitious expansion and technological advancements position the company as a serious contender in the AI hardware market. By offering faster processing speeds and more cost-effective solutions, Cerebras is challenging Nvidia's dominance and potentially reshaping the landscape of AI infrastructure.
Reference
[1]
[3]
Cerebras Systems announces hosting of DeepSeek's R1 AI model on US servers, promising 57x faster speeds than GPU solutions while addressing data privacy concerns. This move reshapes the AI landscape, challenging Nvidia's dominance and offering a US-based alternative to Chinese AI services.
2 Sources
2 Sources
Cerebras Systems, an AI chip startup, has filed for an IPO, positioning itself as a potential competitor to Nvidia in the AI computing market. The company's unique wafer-scale engine technology and recent financial growth have drawn attention in the tech industry.
11 Sources
11 Sources
Cerebras Systems, a leading AI chip manufacturer, has filed for an initial public offering (IPO), revealing significant revenue growth and reduced losses. The company aims to challenge Nvidia's dominance in the AI chip market.
2 Sources
2 Sources
As Nvidia dominates the AI training chip market with GPUs, competitors are focusing on developing specialized AI inference chips to meet the growing demand for efficient AI deployment and reduce computing costs.
6 Sources
6 Sources
Intel launches Tiber AI Cloud, powered by Gaudi 3 chips, partnering with Inflection AI to offer enterprise AI solutions, competing with major cloud providers and NVIDIA in the AI accelerator market.
4 Sources
4 Sources
The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2025 TheOutpost.AI All rights reserved