2 Sources
[1]
AMD's 20x30 AI efficiency target hinges on rack scale
Who'd have thunk? The bigger the iron, the more efficient it gets With Moore's Law on its last legs and datacenter power consumption a growing concern, AMD is embarking on an ambitious new goal to boost the energy efficiency of its chips 20-fold before 2030. And it sees rack-scale architectures as a key design point to get there. "The counterintuitive thing here... is the bigger the device, the more efficient it is," AMD SVP and Fellow Sam Naffziger tells El Reg. "But what we're getting is what used to be a whole rack of compute devices in a single package." AMD was among the first to apply this logic to its CPUs and GPUs, embracing a chiplet architecture that enabled it to overcome reticle limits and squeeze more performance from each watt consumed. The ultimate culmination of this philosophy was AMD's MI300 series of APUs and GPUs, which formed a dense sandwich of 3D stacked compute, I/O dies, and interposers. Now, AMD is looking beyond the chip package and even the node to the rack scale to drive efficiencies over the next few years. "That's the way we're going to be able to deliver continued significant improvements is being able to architect almost at the data center level," Naffziger said. AMD isn't the first to reach this conclusion. At GTC last year, Nvidia revealed its first rack-scale system, the GB200 NVL72. Traditionally, both companies' GPU systems have used high-speed interconnects like NVLink or InfiniBand to pool their resources, making four or eight accelerators function as one great big one. With the GB200 NVL72, Nvidia extended this scale-up network to the rack level, using 18 NVLink switch chips to make the 120kW monster's 72 Blackwell GPUs behave as one. This spring, Nvidia unveiled its plans to extend this architecture to 144 and eventually 576 GPUs and up to 600kW of power. However, the idea dates back much further. "Rack scale is really re-inventing the scale-up multi-processing that IBM did in the 80s with shared memory spaces, load and store," but rather than a few dozen System/370 mainframes, we're now talking about tens, potentially hundreds of GPUs, Naffziger contends. AMD's first rack-scale compute platform is slated to arrive next year with the launch of its MI400. Naffziger suggests it'll follow the same basic formula as Nvidia's NVL systems, albeit using the Universal Accelerator Link UALink interconnect rather than NVLink. However, future designs could end up looking quite a bit different. Most notably, Naffziger expects photonic interconnects could replace copper in the scale-up fabrics within the next five years. Co-packaged optics (CPO) have long promised greater bandwidth and reach than copper cables or traces, but have been held back by the increased power consumption associated with the lasers. "Everything's driven by economics, and we're at the point where economics will favor optical," Naffziger said. For all the advantages co-packaged optics presents, it isn't perfect. "There are temperature sensitives with optical," Naffziger said. "There's a lot more to worry about than in electrical space... Now we've got to route fiber attach and make sure it's mechanically robust and not susceptible to vibration." This might explain why Nvidia has focused its early photonics efforts on the scale-out Ethernet and InfiniBand networks rather than boutique chip-to-chip interconnects. Most large scale photonic switches already require extensive use of power-hungry pluggable optics. So, for its first batch of photonic switches, Nvidia is using CPO to eliminate the need for these devices. However, for its NVLink switch fabric, the company appears to be opting for greater rack densities, up to 600kW by 2027, in order to stick to copper. As AMD prepares to scale up, Naffziger notes that process technology and improvements in semiconductor packaging will continue to play a role in achieving its 20x30 goal. "There's still the remnants of Moore's law out there," he said. "We've got to use the latest process nodes." While process technology isn't shrinking as quickly as it once did, there are still improvements to be had -- especially when it comes to memory. Naffziger pointed to 3D stacking and base die customization for high-bandwidth memory (HBM) as potential avenues for driving down the energy per bit, and reducing overall power consumption. HBM accounts for a substantial amount of accelerator power consumption today. You may recall that with the jump from 192GB on the MI300X to 256GB on the MI325X, power consumption increased by 250W. So any packaging technologies that allow for both higher bandwidth and capacity while also curbing power consumption are worth investigating at the very least. Even at rack scale, Naffziger says the "biggest improvements are going to be the fruit of hardware-software co-design. The raw hardware gains are reaching diminishing returns." AMD has trailed in software, particularly when it comes to low-level development. However the situation has improved considerably in the year and a half since its MI300X made its debut. The chip shop has invested considerable resources to optimize its ROCm software stack for a wide range of popular inference and training platforms, including vLLM, SGLang, and PyTorch. These efforts have been bolstered by several acquisitions, including Nod.ai, Mipsology, and Brium. AMD has also been eager to attract AI talent. Most recently, Sharon Zhou, CEO of AMD-friendly startup Lamini, which helps companies tune LLMs to reduce hallucinations, announced her plans to join the House of Zen's AI software efforts on Wednesday. "When we talk about a rack-scale goal, there definitely are big opportunities in system architecture, system design, improved components, integration reducing the cost of communication," Naffziger said. "But we've got to map the workload optimally on that hardware." FP8 and now FP4 support is just one example of this. On the model side, these lower precision datatypes offer a number of advantages, trading often imperceptibly lower output quality for a smaller memory footprint. Meanwhile, halving the precision usually doubles the floating point output of an accelerator. However, it can take software time to catch up to these new data types. It took the better part of a year from the time the MI300X launched to when the popular vLLM inference engine extended hardware support for AMD's FP8 implementation. Software may be key to unlocking the full potential of AMD's silicon, but it also presents challenges when it comes to measuring performance, particularly when it comes to AI workloads. The AI ecosystem is moving incredibly quickly. In a matter of months, a model can go from bleeding edge to antiquated. "We can't assume Llama 405B is going to be here in 2030 and have any meaning," Naffziger said. So for AMD's 20x30 goal, it'll use a combination of GPU FLOPS, HBM, and network bandwidth, which are weighted differently for inference and training to keep track of its progress. ®
[2]
AMD CTO: power constraints, not compute, will shape tomorrow's supercomputers
In a nutshell: In his keynote at ISC 2025, AMD CTO Mark Papermaster emphasized that the next wave of supercomputers will demand breakthroughs in efficiency, reliability, and adaptability, not just raw performance. He noted that industry leaders are now grappling with the realities of explosive growth and the increasingly complex challenges it brings. Papermaster began by highlighting the continued surge in demand for high-performance computing, driven primarily by artificial intelligence. He pointed to the emergence of new systems in Germany - such as Jupiter and Blue Lion - as evidence of the sector's rapid expansion. "The performance of supercomputers will continue to increase rapidly," he said, emphasizing that it is user demand that fuels innovation and progress. While AI remains at the forefront, Papermaster stressed that traditional computing techniques are still essential, especially in scientific applications where precision is critical. He explained that double-precision (FP64) calculations remain vital, and although lower-precision formats like FP16 and FP8 are becoming more popular for certain workloads, a hybrid approach is necessary. "It's not just FLOPS!" he said, underscoring that raw compute power is only part of the equation. The conversation soon turned to the practical challenges facing the industry. Papermaster noted that while computational performance can, in theory, scale rapidly, real-world constraints such as bandwidth and power consumption are becoming increasingly critical. He warned that the total board power of accelerators is projected to reach 1,600 watts or more by 2026 or 2027, with 2,000 watts potentially on the horizon. "Power and cooling will be, and in some cases already are, the biggest limitations," he said, adding that memory bandwidth must double every two years to keep pace, which only compounds power demands. Papermaster noted that future AI data centers could consume power on the scale of hundreds of megawatts. He even joked about the possibility of "15 manufacturers of small nuclear reactors," a tongue-in-cheek reference to the industry's surging energy demands. On the product front, Papermaster highlighted AMD's growing momentum with its Instinct accelerators. The MI300X and MI300A have already gained traction, generating $5 billion in their first year and capturing five percent of the market. Looking ahead, he confirmed the upcoming launch of the MI355X, which will offer up to 35 times the inference performance of its predecessor in certain workloads. The new accelerator will be available in both air- and water-cooled variants, underscoring AMD's continued focus on performance and efficiency. Papermaster also touched on the potential of in-memory computing as a future leap in energy efficiency, confirming that AMD is actively exploring the technology. Cost remains a critical concern for professional users, Papermaster acknowledged, as organizations must weigh the trade-off between significant upfront investment and the ongoing expense of extended system runtimes. While AMD continues to support open-source and freely available tools, he admitted that development of their new 2nm chip was "not open - and extremely expensive," requiring significantly more time and resources than previous designs. Many of the tools needed for such advanced chips are proprietary and will likely remain so for the foreseeable future. On the topic of connectivity, Papermaster pointed to "Ultra Ethernet" as a promising technology that could help lower infrastructure costs. He noted that Nvidia, feeling the competitive pressure, has begun opening up its NVLink interconnect to other vendors in response - an effort to counter Ultra Ethernet's disruptive potential. A wider array of options in this space, he suggested, could ultimately help drive down costs and foster a more competitive ecosystem. When asked about the role of RISC-V, Papermaster clarified that while AMD is already using RISC-V in certain applications, it is "currently unrealistic to classify these chips as supercomputers." He emphasized that RISC-V remains in its early stages of maturity and will require substantial development before it can compete at the highest performance tiers. Looking ahead, Papermaster envisioned the ideal supercomputer as a highly versatile system capable of handling a broad spectrum of workloads. He acknowledged that this goal remains a long-term aspiration, and that specialized architectures will continue to dominate for the foreseeable future. However, modular construction - exemplified by projects like Jupiter - provides a promising path forward. Future supercomputers are likely to be assembled from many interconnected components, each optimized for specific tasks. Achieving high performance across such diverse workloads, he said, will require a blend of emerging technologies, including optical interconnects and eventual support for quantum computing.
Share
Copy Link
AMD sets an ambitious target to boost AI chip energy efficiency 20-fold by 2030, focusing on rack-scale architectures and addressing power constraints in next-generation supercomputers.
AMD has set an ambitious target to boost the energy efficiency of its AI chips 20-fold before 2030. This initiative, dubbed "20x30," comes at a time when Moore's Law is slowing down and datacenter power consumption is becoming a growing concern. AMD's strategy hinges on embracing rack-scale architectures as a key design point to achieve this goal 1.
Sam Naffziger, AMD SVP and Fellow, explains the counterintuitive nature of this approach: "The bigger the device, the more efficient it is." This philosophy builds upon AMD's successful chiplet architecture, which has already enabled the company to overcome reticle limits and improve performance-per-watt ratios 1.
AMD's focus on rack-scale computing represents a significant shift in design philosophy. By architecting at almost the datacenter level, AMD aims to deliver continued significant improvements in efficiency. This approach isn't entirely new; Nvidia introduced its first rack-scale system, the GB200 NVL72, at GTC last year 1.
AMD's first rack-scale compute platform, the MI400, is scheduled to launch next year. It will utilize the Universal Accelerator Link (UALink) interconnect, similar to Nvidia's NVLink-based systems. However, future designs may incorporate more advanced technologies, such as photonic interconnects, which could replace copper in scale-up fabrics within the next five years 1.
As computational performance continues to scale rapidly, power consumption and cooling have become critical limiting factors. AMD CTO Mark Papermaster highlighted this issue in his keynote at ISC 2025, noting that the total board power of accelerators is projected to reach 1,600 watts or more by 2026 or 2027, with 2,000 watts potentially on the horizon 2.
Source: TechSpot
Papermaster even joked about the possibility of "15 manufacturers of small nuclear reactors," underscoring the industry's surging energy demands. Future AI data centers could consume power on the scale of hundreds of megawatts, making efficiency improvements crucial 2.
AMD's Instinct accelerators, including the MI300X and MI300A, have gained significant traction in the market, generating $5 billion in their first year and capturing five percent of the market share. The company is now preparing to launch the MI355X, which promises up to 35 times the inference performance of its predecessor in certain workloads 2.
Source: The Register
The MI355X will be available in both air- and water-cooled variants, reflecting AMD's continued focus on performance and efficiency. Additionally, AMD is exploring in-memory computing as a potential leap forward in energy efficiency 2.
Papermaster envisions future supercomputers as highly versatile systems capable of handling a broad spectrum of workloads. While this remains a long-term goal, the industry is moving towards modular construction, exemplified by projects like Jupiter. These systems will likely be assembled from many interconnected components, each optimized for specific tasks 2.
To achieve high performance across diverse workloads, future supercomputers will incorporate a blend of emerging technologies, including optical interconnects and eventual support for quantum computing. However, Papermaster clarified that while AMD is using RISC-V in certain applications, it is "currently unrealistic to classify these chips as supercomputers" due to the architecture's early stage of maturity 2.
As the industry grapples with the challenges of explosive growth and increasing complexity, AMD's 20x30 initiative represents a bold step towards a more efficient and sustainable future for AI and high-performance computing.
Summarized by
Navi
[1]
Google has launched its new Pixel 10 series, featuring improved AI capabilities, camera upgrades, and the new Tensor G5 chip. The lineup includes the Pixel 10, Pixel 10 Pro, and Pixel 10 Pro XL, with prices starting at $799.
60 Sources
Technology
12 hrs ago
60 Sources
Technology
12 hrs ago
Google launches its new Pixel 10 smartphone series, showcasing advanced AI capabilities powered by Gemini, aiming to compete with Apple in the premium handset market.
22 Sources
Technology
12 hrs ago
22 Sources
Technology
12 hrs ago
NASA and IBM have developed Surya, an open-source AI model that can predict solar flares and space weather with improved accuracy, potentially helping to protect Earth's infrastructure from solar storm damage.
6 Sources
Technology
20 hrs ago
6 Sources
Technology
20 hrs ago
Google's latest smartwatch, the Pixel Watch 4, introduces significant upgrades including a curved display, AI-powered features, and satellite communication capabilities, positioning it as a strong competitor in the smartwatch market.
18 Sources
Technology
12 hrs ago
18 Sources
Technology
12 hrs ago
FieldAI, a robotics startup, has raised $405 million to develop "foundational embodied AI models" for various robot types. The company's innovative approach integrates physics principles into AI, enabling safer and more adaptable robot operations across diverse environments.
7 Sources
Technology
12 hrs ago
7 Sources
Technology
12 hrs ago