3 Sources
[1]
Omni-Path is back to take on InfiniBand?
After a five-year hiatus, Cornelis' interconnect returns at 400Gbps, with Ethernet support next Five years after Intel spun off its Omni-Path interconnect tech into Cornelis Networks, its 400Gbps CN5000 line of switches and NICs is finally ready to do battle with its long-time rival, Nvidia's InfiniBand. This time around, Cornelis isn't just going after supercomputers and HPC clusters. It's looking to get in on the AI boom as well by undercutting Nvidia on price performance. For those who thought Omni-Path was dead and brain-dumped all memories of it, here's a quick refresher. Initially developed by Intel in 2015, Omni-Path is a lossless interconnect technology, similar in many respects to Nvidia's InfiniBand networking, aimed at high-performance compute applications. The first Omni-Path switches offered 4.8Tbps of bandwidth across 48 100Gbps ports, and saw deployment in a number of supercomputing platforms, like the Los Alamos National Lab's Trinity system and the Department of Energy's Cori machine. However, by 2019, Intel had abandoned the project, and it spun off the division as Cornelis Networks in September 2020. Omni-Path has been around this whole time, it's just been stuck at 100Gbps. Now, Cornelis Networks is emerging from hibernation with a full complement of 400Gbps Omni-Path switches, NICs, and cabling that the company says can support clusters of more than 500,000 endpoints with near-linear performance scaling. Rather than dawdling on, let's dive into what we all really care about: speeds and feeds. First up is Cornelis' CN5000 superNIC. Similar to InfiniBand, today you can't use any NIC with Cornelis' switches and still get the benefits of the Omni-path architecture. The card will be offered with either one or two 400 Gbps ports -- presumably for redundancy, rather than additional bandwidth, as its PCIe 5.0 interface can't actually support more than one port at those speeds -- and will have a 15-to-19-watt power draw, depending on whether you opt for air or liquid cooling. (The entire CN5000 line will support both.) The NICs are designed to be paired to one of two CN5000 switches. The first is a 48-port appliance that takes up a single unit in your rack and offers 19.2 Tbps (400Gbps per port) of switching capacity. The CN5000 is predominately aimed at enterprise AI and HPC deployments. For larger scale deployments, Cornelis' 576 port CN5000 Director class switch offers up to 230.4 Tbps of aggregate bandwidth. Measuring at more than 600 pounds and between 17 and 19 rack units in height depending on whether you opt for liquid or air cooling, the Director switch is rated for roughly 22 kilowatts of power when fully populated with pluggable optics. In fact, calling this a switch is a bit of a misnomer. It's really more of a switch chassis with 18 CR5000s, arranged in a two-level topology with 12 leaves and 6 spines. This has the benefit of simplifying cabling and potentially reducing the number of optical transceivers required to support large-scale deployments. Alongside its switches and NICs, Cornelis also offers a number of active optical and copper cables. Compared to Nvidia's 400Gbps Quantum-2 InfiniBand and ConnectX-7 NICs, Cornelis promises up to 2x higher messaging rates, 35 percent lower latency, and 30 percent faster simulation times. As with any vendor-supplied benchmarks, take these claims with a grain of salt. More importantly, Cornelis Networks CEO Lisa Spelman, who you may remember from her time leading Intel's Xeon division, claims the products will undercut Nvidia on price by a significant margin. While Cornelis claims a performance edge over InfiniBand, its CN5000 switches fall a bit behind on bandwidth, offering about three quarters the number of 400Gbps ports at 48 versus 64. And that's compared to Nvidia's nearly three-year-old Quantum-2 switches. Nvidia is set to boost port counts to 144 and speeds to 800Gbps with the launch of its Quantum-X800 and Quantum-X photonics platforms later this year. However, that higher port bandwidth probably isn't as big a deal as it might seem, especially if you're not using Nvidia GPUs. That's because 400Gbps is the fastest you can go on a PCIe 5.0 NIC anyway. The only way around that is to strap a PCIe 6.0 switch to your NIC and hang your GPUs off it. This is exactly what Nvidia has done with its ConnectX-8 NICs. Having said that, Cornelis expects to make the jump to 800Gbps next year, timed with the launch of the first PCIe 6.0 compatible CPUs from Intel and AMD. Port count, on the other hand, may end up being a problem for Cornelis' kit depending on the scale of your network. With just 48 ports, Cornelis' CN5000 isn't a particularly high-radix switch - which is to say, you're going to need a lot of them to support a large-scale HPC or AI training cluster. While the CN5000 switch was designed for the enterprise, where smaller deployments are likely to be the norm, it can support large-scale environments. The company claims its equipment can support hundreds of thousands of endpoints. But to network 128,000 GPUs at 400Gbps, we estimate you'd need somewhere in the neighborhood of 13,334 CN5000s in a three-level, non-blocking topology to make it work. This topology, often referred to as a fat tree, is commonly employed in AI networks, as it offers a nice balance between bandwidth, latency, and congestion management. But if you wanted to do the same thing using Nvidia's Quantum-2 InfiniBand switches, you'd need only 10,000 of them. Moreover, if networking scale is your main priority, Ethernet has a clear advantage here. Even though Spelman insists Omni-Path isn't trying to compete with Ethernet, Ethernet is certainly evolving to compete with it. A 51.2Tbps Ethernet switch, like Broadcom's Tomahawk 5 or Nvidia's Spectrum-4, would only need 5,000 appliances to get the job of networking 128,000 GPUs at 400Gbps done. Broadcom's new Tomahawk 6, which we looked at last week, would be able to accomplish it with half that many. (Although, just like Nvidia's Spectrum-X800, it's going to be a little while before you can get your hands on switches based on Broadcom's latest ASICs, and even when you can, they're likely to cost substantially more than Cornelis' enterprise-focused kit.) Networking such a colossal cluster isn't exactly easy either, which is no doubt why Cornelis opted to build the CN5000 Director in the first place. With 576 ports, only 733 of these Director switches would be required for a 128,000 GPU cluster, and would eliminate about a third of runs. It should be noted that while a fat-tree topology offers a useful point of comparison, it's only one of many employed in AI and HPC clusters today. Which of these will deliver the best ratio of price to performance depends heavily on the application, Spelman notes. "You have to measure the effectiveness of your network based on the impact it has on your total cluster performance and ultimately your application performance," she said. If you base your decision on a micro-benchmark or the number of switches required, she argues, you may end up with a network that looks good on paper, but isn't well optimized for application performance. "The goal of the network is to accelerate your applications, and that's what we're trying to do. It's not networking for networking's sake," she said. Smaller, flatter networks require fewer network hops, which reduces latency, which for AI training workloads can make a big difference. However, as Cornelis Networks co-founder Phil Murphy points out, because the company's switches offer so much lower latency than Ethernet or InfiniBand, it can actually get away with having more hops without compromising on latency. If Cornelis actually manages to undercut InfiniBand to a meaningful degree and the CN5000 can deliver on the company's performance claims, the switches' lower radix may not be as big a deal. There's not much point in having a 128,000 GPU cluster if your network prevents you from achieving more than 30-50 percent utilization. This is the challenge facing Ethernet scale-out fabrics, Spelman said. "Even in the best, most highly tuned environment, you're getting maybe 50-55 percent utilization. So there's so much room there to improve." For data-heavy AI training workloads, Cornelis claims its Omni-Path offers 6x shorter collective communication times than RDMA over converged Ethernet (RoCE). When the Ethernet spec was first drafted, high-performance computing or AI training clusters weren't exactly a high priority. One of the challenges in Ethernet fabrics is packet loss. Any time a packet fails to reach its destination, it has to be rebroadcast. This results in higher tail latencies where the accelerators are stuck waiting for the rest of the network to catch up. AMD has previously estimated that, on average, 30 percent of training time is wasted waiting for the network to catch up. But things are starting to change. Over the past few years, Ethernet platforms like Broadcom's Tomahawk 5 and 6, Nvidia's Spectrum-X product lines, and AMD's Pensando NICs have evolved to make use of complex packet routing, congestion management, and packet spraying techniques to achieve what they claim are InfiniBand-like levels of performance, loss, and latency. "GPU utilization on Ethernet networks built with Broadcom silicon is as good as, if not better than, networks built with InfiniBand or OmniPath," Pete Del Vecchio, product line manager for Broadcom's Tomahawk line, told El Reg. "All of the largest GPU clusters being deployed this year - by every major hyperscaler - are using Ethernet." "It simply isn't credible to suggest that they'd knowingly deploy a network fabric delivering only one-half or one-third the utilization of an alternative," he added. In their current form, Cornelis' Omni-Path switches and NICs aren't designed to replace Ethernet, but that won't always be the case. Starting next year, Cornelis' 800Gbps capable CN6000-series products will introduce cross-compatibility with Ethernet. That means you'll be able to use the company's superNICs with, say, a Broadcom switch, or its switches with something like a Pensando NIC. At that point, we reckon Cornelis' CN6000 will be somewhat similar to Nvidia's Spectrum-X switches and BlueField superNICs. They'll work with any other Ethernet kit, but they'll perform best when paired together. "Instead of starting with an Ethernet base and trying to add in all these features or capabilities, we're starting with the Omni-Path architecture and we're adding in Ethernet," Spelman said. "What we've done and created is this adaptation layer that allows our Ethernet to have access to some of those Omni-Path features." This approach underscores Cornelis' transition to Ultra Ethernet as well. Introduced in 2023, The Ultra Ethernet Consortium (UEC) was founded by industry leaders like AMD, HPE, Arista, Broadcom, and others to modernize the Ethernet protocol for use in HPC and AI applications. Cornelis has been a major supporter of Ultra Ethernet nearly from the beginning. Two years later, the first Ultra Ethernet-compatible chips are now making their way to market, even as the spec itself remains in its infancy. "We're going to continue down the journey of integration with Ultra Ethernet, but what we started with first is our baseline architecture that already meets the feature requirements of what UEC has outlined," Spelman explained. "We're not holding up the roadmap to wait for the consortium." In other words, Omni-Path already works, so they'll add support for Ultra Ethernet when it's ready. Spelman expects that to happen in 2027 when Cornelis brings its 1.6Tbps compatible CN7000-series switches and NICs to market. ®
[2]
Cornelis Networks releases tech to speed up AI datacenter connections
SAN FRANCISCO, June 3 (Reuters) - Cornelis Networks on Tuesday released a suite of networking hardware and software aimed at linking together up to half a million artificial intelligence chips. Cornelis, which was spun out of Intel (INTC.O), opens new tab in 2020 and is still backed by the chipmaker's venture capital fund, is targeting a problem that has bedeviled AI datacenters for much of the past decade: AI computing chips are very fast, but when many of those chips are strung together to work on big computing problems, the network links between the chips are not fast enough to keep the chips supplied with data. Nvidia (NVDA.O), opens new tab took aim at that problem with its $6.9 billion purchase in 2020 of networking chip firm Mellanox, which made networking gear with a network protocol called InfiniBand, which was created in the 1990s specifically for supercomputers. Networking chip giants such as Broadcom (AVGO.O), opens new tab and Cisco Systems (CSCO.O), opens new tab are working to solve the same set of technical issues with Ethernet technology, which has connected most of the internet since the 1980s and is an open technology standard. The Cornelis "CN5000" networking chips use a new network technology created by Cornelis called OmniPath. The chips will ship to initial customers such as the U.S. Department of Energy in the third quarter of this year, Cornelis CEO Lisa Spelman told Reuters on May 30. Although Cornelis has backing from Intel, its chips are designed to work with AI computing chips from Nvidia, Advanced Micro Devices or any other maker using open-source software, Spelman said. She said that the next version of Cornelis chips in 2026 will also be compatible with Ethernet networks, aiming to alleviate any customer concerns that buying Cornelis chips would leave a data center locked into its technology. "There's 45-year-old architecture and a 25-year-old architecture working to solve these problems," Spelman said. "We like to offer a new way and a new path for customers that delivers you both the (computing chip) performance and excellent economic performance as well." Reporting by Stephen Nellis in San Francisco; Editing by Leslie Adler Our Standards: The Thomson Reuters Trust Principles., opens new tab Suggested Topics:Disrupted
[3]
Intel Spin-Off: Our InfiniBand Alternative For AI Data Centers Has A 'Devastatingly Good' Edge
Cornelis Networks CEO Lisa Spelman tells CRN that the channel-driven vendor's new 400-Gbps CN5000 family of scale-out networking solutions offers the 'best performance with devastatingly good price-performance,' and it has bigger plans for the future. As the growing size and complexity of AI models keeps the tech industry hungry for increasingly faster and more expensive data centers, Intel spin-off Cornelis Networks said its new family of high-speed networking products can help companies save money and increase overall performance for such infrastructure. In an interview with CRN, Cornelis Networks CEO Lisa Spelman (pictured) said the company's 400-Gbps CN5000 family of scale-out networking solutions, which launched Tuesday, offers the "best performance with devastatingly good price-performance" compared to offerings based on InfiniBand or Ethernet, including those made by Nvidia. Cornelis, which also unveiled an updated road map that promises Ethernet compatibility with the next-generation, 800-Gbps CN6000 family in 2026, spun out of Intel in 2020, roughly a year after the semiconductor giant halted development of its Omni-Path architecture for data centers running high-performance computing and AI workloads. [Related: Ampere Wants To Help Channel Rein In AI Server Costs With Systems Builders Program] Spelman, a former Intel executive who became CEO of the channel-driven company last year, said Omni-Path is what allows the CN5000 networking products to provide two times higher message rates, 35 percent lower latency and up to 30 percent faster performance for high-performance computing workloads like computational fluid dynamics in comparison to Nvidia's 400-Gbps InfiniBand NDR solution. Equipped to support data center deployments of up to 500,000 nodes, the CN5000 family of products -- which include SuperNICs, switches, cabling and management software -- can also deliver six times faster collective communication for AI applications than RDMA over Converged Ethernet (ROCE) solutions, according to Spelman. "Everything on paper can look the same when you're saying, 'Oh, okay, this is a 400-gig NIC, and this is a 400-gig NIC, and this is a 400-gig NIC,' but it's the architecture that starts to draw out the performance underneath that, because our assumption is everyone's going to operate at essentially the same bandwidth," she said last Thursday. The network improvements, made possible by features like credit-based flow control and dynamic fine-grained adaptive routing, translate to improved utilization for GPUs and other processors at the center of AI data centers that are traditionally not being used to their full advantage because of inefficiency in the network, according to Spelman. "I really don't think as many people as you would think truly understand how underutilized some of that compute is," she said. A point of pride for Spelman is the fact that the ASICs (application-specific integrated circuits) underlying the CN5000 switch and SuperNIC products are the first versions of the chips that have been manufactured, which is rare in the semiconductor industry. "I've been in the industry 25 years, and this is my first time being part of a team delivering first-pass, production-quality silicon success," she said. "And it just shows the commitment to quality, to engineering excellence, to use of modern tools, including AI [and] emulation -- all of that for our design phase, our modeling phase [and] our validation phase." By improving the utilization of computer chips across a data center with Cornelis' Omni-Path-based products, operators of such infrastructure can run larger AI models without needing to expand capacity or use less capacity to run the same size of model. "It's the first time they're seeing all this optionality in these choices about bigger models or the same model for a lower price, all driven through the network, fundamentally helping solve that compute underutilization challenge," Spelman said. The CN5000, which will become broadly available in the third quarter after it starts shipping to customers this month, "seamlessly interoperates" with CPUs, GPUs and accelerator chips from AMD, Intel, Nvidia and other companies, according to Spelman. However, she added, Nvidia's GPU-accelerated DGX systems "won't be an area for us to play in," but she does see a big opportunity for the CN5000 products with servers that use Nvidia's HGX baseboard, which connects eight GPUs via NVLink. "If you look at their HGX offering, which has a lot more flexibility on what additional kind of components go in there, we offer a great alternative there, because we give people the option to do something a bit different than what's in the standard DGX and also really drive up the performance," Spelman said. While the CN5000 products are aimed at on-premises deployments for enterprise, government and academic customers running AI and HPC workloads, Cornelis sees substantially bigger market opportunities with the next two generations of products. With next year's 800-Gbps CN6000 products, for instance, Cornelis plans to enable Ethernet compatibility with ROCE support that will allow Ethernet-based networks to access some of Omni-Path's features, according to Spelman. Spelman said this integration, which Cornelis has been developing with a "cloud-definitional customer," will help the company start to pursue the cloud infrastructure market and stand apart from other Ethernet networking startups. "It was the innovation around creating the adaptation layer that we've built in that allows you to get access to those [Omni-Path] features, because otherwise we're just another standard Ethernet vendor banging our head against a big wall," she said. Then in 2027, the company plans to release its 1.6-Tbps CN7000 products, which will integrate standards from the widely supported Ultra Ethernet Consortium into Omni-Path. They will also feature support for 2 million nodes and in-network computing. This will establish a "new performance frontier for the most demanding AI and HPC environments," according to Cornelis. Spelman said one of the reasons why she's convinced Cornelis can become a formidable player is because the company has the "most advanced" features and capabilities in the industry, some of which are now being implemented by other companies for their own products. She also said standards being developed by the Ultra Ethernet Consortium -- which counts Cornelis, Intel, Broadcom, AMD, Nvidia and others as members -- show that the "ideal scale-out network looks a lot like the Omni-Path architecture." "So from the AI perspective, I think we're absolutely on the right track with the features, capability and performance, because we're now seeing some of the larger AI players literally emulating the base architecture we have here," Spelman said. Given Cornelis' roots of selling high-speed interconnects for HPC workloads before AI became a worldwide phenomenon, "channel partners have always been super critical" for the company's go-to-market efforts, Spelman said. As such, the company runs a global partner program with a growing roster of partners that includes Ace Computers, Advanced Clustering Technologies, Colfax International, Microway and Penguin Solutions. Cornelis also works closely with OEMs like Dell Technologies, Hewlett Packard Enterprise, Lenovo and Supermicro. Spelman said Cornelis last fall hired Intel channel veteran Mike Lafferty, who previously ran the semiconductor giant's HPC board of advisors, to grow the partner network. And last month the company hired former Intel sales executive Brad Haczynski, who was most recently at Samsung, as chief commercial officer. "The fact that we're a startup, I think we're doing a good job of being focused in this space, because we recognize how important it is," Spelman said. Mako Furukawa, a senior sales engineer at Wheat Ridge, Colo.-based Cornelis partner Aspen Systems, said the company is strengthening its partnership with Cornelis to implement Omni-Path solutions "across our education, government and commercial customers." "The new CN5000 line represents a significant advancement in the Omni-Path architecture that our customers have been eagerly anticipating," said Furukawa in a statement.
Share
Copy Link
Cornelis Networks, an Intel spin-off, launches its 400Gbps CN5000 line of switches and NICs, reviving the Omni-Path interconnect technology to compete with Nvidia's InfiniBand in AI and HPC markets, promising better performance at lower costs.
Five years after Intel spun off its Omni-Path interconnect technology, Cornelis Networks has emerged with a new 400Gbps line of switches and NICs, aiming to challenge Nvidia's dominance in the high-performance computing (HPC) and artificial intelligence (AI) networking market 1. The CN5000 series, set to ship in the third quarter of 2025, promises to deliver improved performance and cost-effectiveness compared to existing solutions 2.
Source: The Register
The CN5000 superNIC offers one or two 400Gbps ports, with a power draw of 15-19 watts depending on cooling options. Cornelis is introducing two switch models:
Cornelis claims its technology provides up to 2x higher messaging rates, 35% lower latency, and 30% faster simulation times compared to Nvidia's 400Gbps Quantum-2 InfiniBand and ConnectX-7 NICs 1. The company also boasts that its CN5000 family can support clusters of more than 500,000 endpoints with near-linear performance scaling 3.
Source: CRN
Cornelis Networks is not just targeting traditional HPC applications but is also looking to capitalize on the AI boom. CEO Lisa Spelman, a former Intel executive, emphasizes the company's focus on providing "the best performance with devastatingly good price-performance" compared to InfiniBand and Ethernet-based solutions 3.
The company's strategy includes:
Cornelis has ambitious plans for the future, including:
These developments could potentially disrupt the current networking landscape dominated by Nvidia's InfiniBand and traditional Ethernet solutions from companies like Broadcom and Cisco 2.
Despite its promising technology, Cornelis faces significant challenges:
Additionally, Ethernet technology continues to evolve, potentially competing with Omni-Path in certain applications 1.
The re-emergence of Omni-Path technology could have significant implications for the HPC and AI networking market:
As AI models continue to grow in size and complexity, the demand for high-performance, cost-effective networking solutions is likely to increase, creating opportunities for companies like Cornelis Networks to challenge established players in the market.
Taiwan has added Chinese tech giants Huawei and SMIC to its export control list, requiring government approval for any tech exports to these companies. This move significantly impacts China's AI chip development efforts and aligns with US restrictions.
4 Sources
Technology
7 hrs ago
4 Sources
Technology
7 hrs ago
ManpowerGroup's Chief Innovation Officer discusses how AI is transforming recruitment and the skills employers will seek in the future, highlighting the need for soft skills and potential over traditional credentials.
2 Sources
Business and Economy
23 hrs ago
2 Sources
Business and Economy
23 hrs ago
OpenAI partners with former Apple design chief Jony Ive to develop a revolutionary AI gadget, while other tech companies explore new interfaces for AI interaction.
2 Sources
Technology
7 hrs ago
2 Sources
Technology
7 hrs ago
A groundbreaking study combines satellite data, space-based LiDAR, and AI algorithms to rapidly and accurately map forest carbon, potentially transforming climate change research and forest management.
2 Sources
Science and Research
8 hrs ago
2 Sources
Science and Research
8 hrs ago
Amazon announces a significant $13 billion investment in Australia's data center infrastructure from 2025 to 2029, aimed at expanding AI capabilities and supporting generative AI workloads.
3 Sources
Business and Economy
15 hrs ago
3 Sources
Business and Economy
15 hrs ago