2 Sources
[1]
Nvidia outlines plans for using light for communication between AI GPUs by 2026 -- silicon photonics and co-packaged optics may become mandatory for next-gen AI data centers
The extreme demands of passing communication between ever-growing clusters of AI GPUs is fueling a move towards using light for communication across the networking layers. Earlier this year, Nvidia outlined that its next-generation rack-scale AI platforms will use silicon photonics interconnects with co-packaged optics (CPO) for higher transfer rates at lower power. At the Hot Chips conference this year, Nvidia released some additional information about its next-generation Quantum-X and Spectrum-X photonics interconnection solutions and when they will arrive in 2026. Nvidia's roadmap will likely closely follow TSMC's COUPE roadmap, which unfolds in three stages. The first generation is an optical engine for OSFP connectors, offering 1.6 Tb/s data transfer while reducing power consumption. The second generation moves into CoWoS packaging with co-packaged optics, enabling 6.4 Tb/s at the motherboard level. The third generation aims for 12.8 Tb/s within processor packages and targets further cuts in power and latency. In large-scale AI clusters, thousands of GPUs must behave as one system, which introduces challenges in how these processors must be interconnected: instead of each rack having its own Tier-1 (Top-of-Rack) switch linked by short copper cables, the switches are moved to the end of the row to create a consistent, low-latency fabric across multiple racks. This relocation greatly extends the distance between servers and their first switch, which makes copper impractical at speeds like 800 Gb/s, so optical connections are required for nearly every server-to-switch and switch-to-switch link. Using pluggable optical modules in this environment introduces clear limits: data signals in such designs leave the ASIC, travel across the board and connectors, and only then are converted to light. That method produces severe electrical loss, up to roughly 22 decibels on 200 Gb/s channels, which requires compensation that uses complex processing and increases per-port power consumption to 30W (which in turn calls for additional cooling and creates a point of potential failure), which becomes almost unbearable as the scale of AI deployments grow, according to Nvidia. CPO sidesteps penalties of traditional pluggable optical modules by embedding the optical conversion engine alongside the switch ASIC, so instead of traveling over long electrical traces, the signal is coupled to fiber almost immediately. As a result, electrical loss is cut to 4 decibels, and per-port power consumption is reduced to 9W. Such a layout removes numerous components that could fail and greatly simplifies the implementation of optical interconnects. Nvidia claims that by moving away from traditional pluggable transceivers and integrating optical engines directly into switch silicon (courtesy of TSMC's COUPE platform), it achieves very substantial gains in efficiency, reliability, and scalability. Improvements of CPOs compared to pluggable modules are dramatic, according to Nvidia: a 3.5-times increase in power efficiency, a 64 times better signal integrity, 10 times boost in resiliency due to fewer active devices, and roughly 30% faster deployment because service and assembly are simpler. Nvidia will introduce CPO-based optical interconnection platforms both for Ethernet and InfiniBand technologies. First, the company plans to introduce Quantum-X InfiniBand switches in early 2026. Each switch will deliver 115 Tb/s of throughput, supporting 144 ports operating at 800 Gb/s each. The system also integrates an ASIC featuring 14.4 TFLOPS of in-network processing and supporting Nvidia's 4th Generation Scalable Hierarchical Aggregation Reduction Protocol (SHARP) to cut latency for collective operations. The switches will be liquid-cooled. In parallel, Nvidia is set to bring CPO into Ethernet with its Spectrum-X Photonics platform in the second half of 2026. This one will rely on the Spectrum-6 ASIC that will power two devices: the SN6810 that provides 102.4 Tb/s of bandwidth with 128 ports at 800 Gb/s, and the larger SN6800 that scales to 409.6 Tb/s and 512 ports at the same rate. Both also use liquid cooling. Nvidia envisions that its CPO-based switches will power new AI clusters for generative AI applications that are getting larger and more sophisticated. Due to the usage of CPO, such clusters will eliminate thousands of discrete components, offering faster installation, easier servicing, and reduced power consumption per connection. As a result, clusters using Quantum-X InfiniBand and Spectrum-X Photonics offer improvements when it comes to such metrics as time-to-turn-on, time-to-first-token, and long-term reliability. Nvidia stresses that co-packaged optics are not an optional enhancement but a structural requirement for future AI data centers, which implies that the company will position its optical interconnects as some of the key advantages over rack-scale AI solutions from rivals, such as AMD. Which is, of course, why AMD has acquired Enosemi. One important thing to note about Nvidia's silicon photonics initiative is that its evolution is tightly aligned with the evolution of TSMC's COUPE (Compact Universal Photonic Engine) platform, which is set to evolve in the coming years, thus improving Nvidia's CPO platforms too. TSMC's 1st Gen COUPE is built by stacking a 65nm electronic integrated circuit (EIC) with a photonic integrated circuit (PIC) using the company's SoIC-X packaging technology. TSMC's COUPE roadmap unfolds in three stages. The first generation is an optical engine for OSFP connectors, offering 1.6 Tb/s data transfer while reducing power consumption. The second generation moves into CoWoS packaging with co-packaged optics, enabling 6.4 Tb/s at the motherboard level. The third generation aims for 12.8 Tb/s within processor packages and targets further cuts in power and latency.
[2]
Nvidia to deploy light based GPU interconnects by 2026
The company's Quantum X InfiniBand and Spectrum X Ethernet platforms will deliver up to 409.6 Tbps using liquid cooled switches built on TSMC COUPE tech. Nvidia is planning to implement light-based communication between its artificial intelligence GPUs by 2026, utilizing silicon photonics interconnects with co-packaged optics (CPO) in its next-generation rack-scale AI platforms to achieve higher transfer rates at reduced power consumption. At the Hot Chips conference, Nvidia provided further details regarding its upcoming Quantum-X and Spectrum-X photonics interconnection solutions, outlining their expected arrival in 2026. These solutions represent a significant move towards optical interconnects to manage the increasing demands of data transfer within large AI GPU clusters. Nvidia's developmental timeline is expected to closely mirror TSMC's COUPE (Compact Universal Photonic Engine) roadmap, which is structured into three distinct phases. The initial phase involves an optical engine designed for OSFP connectors, facilitating data transfers of 1.6 Tb/s while simultaneously lowering power consumption. The second phase transitions to CoWoS packaging incorporating co-packaged optics, thereby achieving 6.4 Tb/s data transfer rates at the motherboard level. The third phase focuses on achieving 12.8 Tb/s within processor packages, with the objective of further decreasing both power usage and latency. The necessity for CPO stems from the challenges associated with interconnecting thousands of GPUs in large-scale AI clusters, requiring them to operate as a unified system. This architecture necessitates modifications to traditional networking configurations. Specifically, instead of each rack having its own Tier-1 (Top-of-Rack) switch connected by short copper cables, the switches are relocated to the end of the row. This configuration establishes a consistent, low-latency fabric spanning multiple racks. This relocation increases the distance between servers and their primary switch, rendering copper cables impractical for high speeds such as 800 Gb/s. Consequently, optical connections become essential for nearly all server-to-switch and switch-to-switch links. Nvidia designs slower B30A chip to meet US restrictions The use of pluggable optical modules in such environments presents inherent limitations. In these designs, data signals exit the Application-Specific Integrated Circuit (ASIC), traverse the board and connectors, and are subsequently converted to light. This process introduces significant electrical loss, reaching approximately 22 decibels on 200 Gb/s channels. Compensation for this loss requires complex processing, which increases per-port power consumption to 30W. This, in turn, necessitates additional cooling and introduces potential points of failure. Nvidia asserts that these issues become increasingly problematic as the scale of AI deployments expands. CPO mitigates the drawbacks associated with traditional pluggable optical modules by integrating the optical conversion engine directly alongside the switch ASIC. This proximity allows the signal to be coupled to fiber almost immediately, bypassing the need to travel over extended electrical traces. As a result, electrical loss is reduced to 4 decibels, and per-port power consumption decreases to 9W. This arrangement also eliminates numerous components that could potentially fail, simplifying the implementation of optical interconnects. Nvidia asserts that transitioning from conventional pluggable transceivers and integrating optical engines directly into switch silicon, facilitated by TSMC's COUPE platform, yields substantial improvements in efficiency, reliability, and scalability. Nvidia reports that CPO offers significant advantages over pluggable modules, including a 3.5-times increase in power efficiency, a 64 times improvement in signal integrity, a 10 times increase in resilience due to the reduction in active devices, and approximately 30% faster deployment times due to simpler service and assembly procedures. Nvidia plans to introduce CPO-based optical interconnection platforms for both Ethernet and InfiniBand technologies. The company anticipates launching Quantum-X InfiniBand switches in early 2026. Each switch is designed to provide 115 Tb/s of throughput, accommodating 144 ports operating at 800 Gb/s each. The system also incorporates an ASIC featuring 14.4 TFLOPS of in-network processing and supports Nvidia's 4th Generation Scalable Hierarchical Aggregation Reduction Protocol (SHARP), aimed at reducing latency for collective operations. These switches will utilize liquid cooling. Concurrently, Nvidia is preparing to integrate CPO into Ethernet through its Spectrum-X Photonics platform, scheduled for release in the second half of 2026. This platform will be based on the Spectrum-6 ASIC, which will power two distinct devices: the SN6810, offering 102.4 Tb/s of bandwidth across 128 ports at 800 Gb/s, and the SN6800, which scales to 409.6 Tb/s and 512 ports operating at the same rate. Both devices will also employ liquid cooling. Nvidia envisions that its CPO-based switches will drive new AI clusters designed for generative AI applications, which are becoming increasingly large and complex. By utilizing CPO, these clusters will eliminate thousands of discrete components, resulting in faster installation times, easier servicing, and reduced power consumption per connection. Consequently, clusters utilizing Quantum-X InfiniBand and Spectrum-X Photonics are expected to demonstrate improvements in metrics such as time-to-turn-on, time-to-first-token, and overall long-term reliability. Nvidia emphasizes that co-packaged optics are not simply an optional enhancement but a fundamental requirement for future AI data centers. This suggests that the company intends to position its optical interconnects as a key differentiator and advantage over rack-scale AI solutions offered by competitors, such as AMD. AMD's acquisition of Enosemi is relevant in this context. A critical aspect of Nvidia's silicon photonics initiative is its close alignment with the evolution of TSMC's COUPE (Compact Universal Photonic Engine) platform. As TSMC's platform advances in the coming years, Nvidia's CPO platforms are expected to correspondingly improve. The first generation of TSMC's COUPE is constructed by stacking a 65nm electronic integrated circuit (EIC) with a photonic integrated circuit (PIC) using the company's SoIC-X packaging technology. The TSMC COUPE roadmap is divided into three stages of development. The initial generation involves an optical engine designed for OSFP connectors, providing 1.6 Tb/s data transfer while simultaneously reducing power consumption. The second generation incorporates CoWoS packaging with co-packaged optics, resulting in a data transfer rate of 6.4 Tb/s at the motherboard level. The third generation is designed to achieve 12.8 Tb/s within processor packages and aims to further reduce power consumption and latency.
Share
Copy Link
Nvidia announces plans to implement silicon photonics and co-packaged optics for AI GPU communication by 2026, promising higher transfer rates and lower power consumption in next-gen AI data centers.
Nvidia has unveiled ambitious plans to revolutionize communication between AI GPUs by 2026, leveraging light-based technology to meet the extreme demands of next-generation AI data centers 12. The company aims to implement silicon photonics interconnects with co-packaged optics (CPO) in its upcoming rack-scale AI platforms, promising higher transfer rates at lower power consumption.
As AI clusters grow in scale and complexity, the challenge of interconnecting thousands of GPUs to function as a single system has become increasingly apparent. Traditional networking configurations, which rely on copper cables and pluggable optical modules, are reaching their limits in terms of speed, power efficiency, and scalability 1.
Nvidia's solution involves relocating switches to the end of the row, creating a consistent, low-latency fabric across multiple racks. This architectural change necessitates optical connections for nearly all server-to-switch and switch-to-switch links, as copper becomes impractical at speeds like 800 Gb/s over extended distances 1.
Source: Dataconomy
The heart of Nvidia's innovation lies in the adoption of co-packaged optics (CPO). This technology embeds the optical conversion engine alongside the switch ASIC, dramatically reducing electrical loss and power consumption 1. Nvidia reports that CPO offers significant advantages over traditional pluggable modules:
Nvidia's roadmap includes two major platforms leveraging CPO technology 12:
Quantum-X InfiniBand switches (Early 2026):
Spectrum-X Photonics (Second half of 2026):
Both platforms will utilize liquid cooling to manage the high-performance requirements 2.
Nvidia's development closely follows TSMC's Compact Universal Photonic Engine (COUPE) roadmap, which unfolds in three stages 1:
Source: Tom's Hardware
Nvidia emphasizes that co-packaged optics are not just an optional enhancement but a structural requirement for future AI data centers 1. The company envisions that its CPO-based switches will power new AI clusters for increasingly sophisticated generative AI applications, offering improvements in key metrics such as time-to-turn-on, time-to-first-token, and long-term reliability 12.
By eliminating thousands of discrete components, these new clusters promise faster installation, easier servicing, and reduced power consumption per connection. This positions Nvidia's optical interconnects as a key advantage over rack-scale AI solutions from competitors like AMD 1.
As the AI industry continues to evolve rapidly, Nvidia's investment in light-based GPU interconnects represents a significant step forward in addressing the growing demands of large-scale AI deployments. The success of this technology could reshape the landscape of AI data centers in the coming years.
Elon Musk's AI company xAI has open-sourced the Grok 2.5 model on Hugging Face, making it available for developers to access and explore. Musk also announced plans to open-source Grok 3 in about six months, signaling a commitment to transparency and innovation in AI development.
7 Sources
Technology
19 hrs ago
7 Sources
Technology
19 hrs ago
Netflix has released new guidelines for using generative AI in content production, outlining low-risk and high-risk scenarios and emphasizing responsible use while addressing industry concerns.
2 Sources
Technology
3 hrs ago
2 Sources
Technology
3 hrs ago
Scientists at KIST have developed a new device principle that utilizes "spin loss" as a power source for magnetic control, potentially revolutionizing the field of spintronics and paving the way for ultra-low-power AI chips.
2 Sources
Technology
3 hrs ago
2 Sources
Technology
3 hrs ago
Cloudflare introduces new features for its Cloudflare One zero-trust platform, aimed at helping organizations securely adopt, build, and deploy generative AI applications while maintaining security and privacy standards.
2 Sources
Technology
3 hrs ago
2 Sources
Technology
3 hrs ago
SK hynix has begun mass production of the world's first 321-layer QLC NAND flash, doubling storage capacity and improving performance. This breakthrough addresses the growing storage demands of AI and data centers while enhancing efficiency.
3 Sources
Technology
3 hrs ago
3 Sources
Technology
3 hrs ago