4 Sources
[1]
AMD unveils ROCm 7 -- new platform boosts AI performance up to 3.5x, adds Radeon GPU support
AMD this week introduced the 7th version of its ROCm (Radeon open compute) open-source software stack for accelerated computing that substantially improves the performance of AI inference on existing hardware compared to ROCm 6, as well as adds support for distributed workloads, and expands to Windows and Radeon GPUs. In addition, ROCm 7 adds support for FP4 and FP6 low-precision formats for the latest Instinct MI350X/MI355X processors. The biggest change brought by ROCm 7 for client PCs is the extension of ROCm to Windows and Radeon GPUs, which allows the use of discrete and integrated GPUs for AI workloads, but only on Ryzen-based PCs. Starting in the second half of 2025, developers will be able to build and run AI programs on Ryzen desktops and laptops with Radeon GPUs, which could be a big deal for those who want to run higher-end AI LLMs locally. One of the reasons for AMD's weak position in the AI hardware market is imperfect software. But it looks like the situation is improving as AMD's Instinct MI300X with ROCm 7 delivers over 3.5 times the inference performance and 3 times the training throughput compared to ROCm 6, according to AMD. The company conducted tests using an 8-way Instinct MI300X machine running Llama 3.1-70B, Qwen 72B, and Deepseek-R1 models with batch sizes ranging from 1 to 256, and the only difference was the usage of ROCm 7 over ROCm 6. AMD says that such improvements are enabled by enhancements in GPU utilization as well as data movement, though it does not provide any more details. The new release also introduces support for distributed inference through integration with open frameworks such as vLLM, SGLang, and llm-d. AMD worked with these partners to build shared components and primitives, allowing the software to scale efficiently across multiple GPUs. Furthermore, ROCm 7 adds support for lower-precision data types like FP4 and FP6, which will bring tangible improvements for the company's latest CDNA 4-based Instinct MI350X/MI355X processors as well as upcoming CDNA 5-based MI400X and next-generation Instinct MI500X-series products that will succeed the Instinct MI300-series in 2026 and 2027, respectively. In addition, along with ROCm 7, AMD introduced its ROCm Enterprise AI MLOps solution tailored for enterprise use. The platform offers tools for refining models using domain-specific datasets and supports integration into both structured and unstructured workflows. AMD said it works with ecosystem partners to build reference implementations for applications such as chatbots and document summarization in a bid to make AMD hardware suitable for rapid deployment in production environments. Last but not least, AMD also launched its Developer Cloud that provides ready-to-use access to MI300X hardware with configurations ranging from a single-GPU MI300X with 192 GB of memory to eight-way MI300X setups with 1536 GB of memory. For starters, AMD provides 25 free usage hours, and additional credits are available through developer programs. Early support for Instinct MI350X-based systems is also planned.
[2]
AMD wants to beat Nvidia at its own game with open-source software
Summary AMD launches ROCm 7 with Windows support in August through ONNX-EP or PyTorch. DeepSeek R1 shows 30% faster FP8 throughput than Nvidia's B200 due to ROCm's open-source approach, according to AMD. With Windows support, AMD says you'll be able to run AI models through ROCm on everything from a Ryzen AI 300 laptop to a Threadripper workstation. AMD thinks Nvidia has it wrong. The future of AI is built on open-source software, and it looks like we're reaching an inflection point. AMD revealed ROCm 7 at Advancing AI 2025, which is the latest version of AMD's open-source software stack that's built to compete with CUDA. ROCm 7 brings a ton of new features, but more important than all of them is Windows support. When ROCm 7 launches in August, AMD says it'll work on Windows through the ONNX-EP framework, and it'll roll out support for PyTorch in Q3. In addition, AMD is doubling down on Linux, adding in-box support for ROCm in Ubuntu, OpenSUSE, and RedHat. Related 5 reasons why it's finally time to build an all-AMD gaming PC Been holding off from building an all-AMD PC? Now's the perfect time! Posts AMD says open-source is the way to go for AI software ROCm 7 is when it all comes together, it looks like Close Although AMD has made inroads into the AI market, Nvidia still dominates -- some estimates place Nvidia's market share above 90%. A large part of that comes down to CUDA, which is Nvidia's software stack that it started developing some 15 years ago. ROCm is AMD's answer, and according to Team Red, its open-source approach is starting to show some performance advantages. With DeepSeek R1, AMD says its new Instinct MI355X GPU is 30% faster than Nvidia's B200 in FP8 throughput, largely due to the fact that this model doesn't work in FP8 with Nvidia's TensorRT-LLM; an issue a developer brought up on the TensorRT GitHub just a day ago. "[CUDA] is not a moat for new architectures," said Raimine Roane, Corporate Vice President of AMD's AI Solutions Group in a press briefing. "Every time there is a major new architecture, we are on the same playing field. There is no moat. It's whoever is going to write the new kernels faster who is going to win. And we're going to win because we work with open-source." In this case, Roane's use of the word "kernel" is in reference to functions within an AI model, such as matrix multiplication -- it's not the same thing as an OS kernel. In addition to ROCm, AMD has been pushing its HIP stack for porting CUDA code to a hardware-agnostic platform. Although ROCm has been around for a while, it hasn't made inroads into the wider AI market, largely due to the fact that it didn't work on Windows. ROCm 7 changes that. In addition to supporting ROCm on Windows, AMD is bringing support for non-data center GPUs like the new Radeon AI Pro R9700 that AMD announced at Computex, and even new Radeon GPUs like the RX 9070 XT and RX 9060 XT. In fact, AMD says you'll be able to run AI software through ROCm on everything from a Ryzen AI 300 chip up to a Threadripper workstation. ROCm 7 is available in preview starting today through the ONNX-EP framework, but it'll roll out more broadly starting in August.
[3]
AMD's powerful AI chips can finally be unleashed on Windows PCs
AMD's hardware teams have tried to redefine AI inferencing with powerful chips like the Ryzen AI Max and Threadripper. But in software, the company has been largely absent where PCs are concerned. That's changing, AMD executives say. AMD's Advancing AI event Thursday focused on enterprise-class GPUs like its Instinct lineup. But it's a software platform you may not have heard of, called ROCm, that AMD depends upon just as much. AMD is releasing ROCm 7 today, which the company says can boost AI inferencing by three times through the software alone. And it's finally coming to Windows to battle Nvidia's CUDA supremacy. Radeon Open Compute (ROCm) is AMD's open software stack for AI computing, with drivers and tools to run AI workloads. Remember the Nvidia GeForce RTX 5060 debacle of a few weeks back? Without a software driver, Nvidia's latest GPU was a lifeless hunk of silicon. Early on, AMD was in the same pickle. Without the limitless coffers of companies like Nvidia, AMD made a choice: it would prioritize big businesses with ROCm and its enterprise GPUs instead of client PCs. Ramine Roane, corporate vice president of the AI solutions group, called that a "sore point:" "We focused ROCm on the cloud GPUs, but it wasn't always working on the endpoint -- so we're fixing that." In today's world, simply shipping the best product isn't always enough. Capturing customers and partners willing to commit to the product is a necessity. It's why former Microsoft CEO Steve Ballmer famously chanted "Developers developers developers" on stage; when Sony built a Blu-ray drive into the PlayStation, movie studios gave the new video format a critical mass that the rival HD-DVD format didn't have. Now, AMD's Roane said that the company belatedly realized that AI developers like Windows, too. "It was a decision to basically not use resources to port the software to Windows, but now we realize that, hey, developers actually really care about that," he said. ROCm will be supported by PyTorch in preview in the third quarter of 2025, and by ONNX-EP in July, Roane said. All this means is that AMD processors will finally gain a much larger presence in AI applications, which means that if you own a laptop with a Ryzen AI processor, a desktop with a Ryzen AI Max chip, or a desktop with a Radeon GPU inside, it will have more opportunities to tap into AI applications. PyTorch, for example, is a machine-learning library that popular AI models like Hugging Face's "Transformers" run on top of. It should mean that it will be much easier for AI models to take advantage of Ryzen hardware. ROCm will also be added to "in box" Linux distributions, too: Red Hat (in the second half of 2025), Ubuntu (the same) and SuSE. Roane also helpfully provided some context over what model size each AMD platform should be able to run, from a Ryzen AI 300 notebook on up to a Threadripper platform. The AI performance improvements that ROCm 7 adds are substantial: a 3.2X performance improvement in Llama 3.1 70B, 3.4X in Qwen2-72B, and 3.8X in DeepSeek R1. (The "B" stands for the number of parameters, in billions; the higher the parameters, the generally higher the quality of the outputs.) Today, those numbers matter more than they have in the past, as Roane said that inferencing chips are showing steeper growth than processors used for training. ("Training" generates the AI models used in products like ChatGPT or Copilot. "Inferencing" refers to the actual process of using AI. In other words, you might train an AI to know everything about baseball; when you ask it if Babe Ruth was better than Willie Mays, you're using inferencing.) AMD said that the improved ROCm stack also offered the same training performance, or about three times the previous generation. Finally, AMD said that its own MI355X running the new ROCm software would outperfom an Nvidia B200 by 1.3X on the DeepSeek R1 model, with 8-bit floating-point accuracy. Again, performance matters -- in AI, the goal is to push out as many AI tokens as quickly as possible; in games, it's polygons or pixels instead. Simply offering developers a chance to take advantage of the AMD hardware you already own is a win-win, for you and AMD alike. The one thing that AMD doesn't have is a consumer-focused application to encourage users to use AI, whether it be LLMs, AI art, or something else. Intel publishes AI Playground, and Nvidia (though it doesn't own the technology) worked with a third-party developer for its own application, LM Studio. One of the convenient features of AI Playground is that every model available has been quantized, or tuned, for Intel's hardware. Roane said that similarly-tuned models exist for AMD hardware like the Ryzen AI Max. However, consumers have to go to repositories like Hugging Face and download them themselves. Roane called AI Playground a "good idea." "No specific plans right now, but it's definitely a direction we would like to move," he said, in response to a question from PCWorld.com.
[4]
AMD ROCm 7 Announced: MI350 Support, New Algorithms, Models & Advanced Features For AI Added, Focus on Inference With 3.5x Uplfit
AMD goes official with its next version of open software stack technologies in the form of ROCM 7, which further accelerates AI & developer productivity. AMD Unveils ROCm 7: The Next-Generation of Open Stack Software Innovations With Focus on AI Inferencing With the announcement of ROCm 7, AMD is finally moving forward from its ROCm 6 software stack, which itself has seen various updates over the last few years and since the advent of AI computing. The following are some of the main features that AMD is focusing on with ROCm 7: With ROCm, AMD says that it is focusing more on the growing inference capabilities within its software stack. The ROCm 7 stack will include enhanced frameworks such as vLLM v1, llm-d, SGLang, and also focuses on serving various optimizations such as Distributed Inference, Prefill, and Disaggregation. New Kernels and Algorithms coming to ROCm 7 include GEMM Autotuning, MoE, Attention, and Python-Based Kernel Authoring. AMD has already announced FP6 and FP4 support for its MI350 series, and ROCm 7 also includes full support for these advanced datatypes such as FP8, FP6, FP4, and Mixed precision. In terms of performance, AMD says that inference has been the largest area of focus with ROCm 7, adding up to 3.5x performance uplifts in AI workloads. Breaking down the performance uplifts, we can see up to 3.2x increase in Llama 3.1 70B, 3.4x increase in Qwen2-72B, and up to 3.8x in Deep Seek R1, versus ROCm 6.
Share
Copy Link
AMD introduces ROCm 7, a significant update to its open-source software stack for accelerated computing, bringing substantial AI performance improvements, Windows support, and expanded hardware compatibility.
AMD has introduced ROCm 7, the latest version of its Radeon Open Compute (ROCm) software stack, marking a significant advancement in accelerated computing and artificial intelligence (AI) performance 1. This update brings substantial improvements to AI inference capabilities, expanded hardware support, and introduces Windows compatibility, potentially reshaping the landscape of AI computing.
ROCm 7 delivers impressive performance gains, with AMD reporting up to 3.5 times improvement in AI inference tasks compared to its predecessor, ROCm 6 1. Specific benchmarks show:
These enhancements are attributed to improved GPU utilization and optimized data movement, although AMD has not provided detailed explanations for these improvements 1.
A key feature of ROCm 7 is its extended support for a wider range of hardware:
This expansion allows for AI model execution across various AMD platforms, from Ryzen AI 300 laptops to high-end Threadripper workstations 23.
ROCm 7 introduces several new capabilities:
To facilitate adoption, AMD has launched its Developer Cloud, providing access to MI300X hardware configurations ranging from single-GPU setups to eight-way systems 1. The company is also expanding ROCm support in popular Linux distributions, including Ubuntu, OpenSUSE, and RedHat 23.
AMD is positioning ROCm 7 as a direct competitor to Nvidia's CUDA, emphasizing its open-source approach as a key advantage 2. The company argues that this strategy allows for faster adaptation to new architectures and potentially better performance in certain scenarios, such as the reported 30% faster FP8 throughput on the DeepSeek R1 model compared to Nvidia's B200 2.
The release of ROCm 7 represents a significant step in AMD's efforts to capture a larger share of the AI hardware and software market. By addressing previous limitations, such as the lack of Windows support, and focusing on both cloud and client-side AI applications, AMD is positioning itself as a strong contender in the rapidly evolving AI computing landscape 23.
As the AI industry continues to grow, the improvements and expanded compatibility offered by ROCm 7 could play a crucial role in democratizing access to AI technologies and fostering innovation across various computing platforms.
Google has launched its new Pixel 10 series, featuring improved AI capabilities, camera upgrades, and the new Tensor G5 chip. The lineup includes the Pixel 10, Pixel 10 Pro, and Pixel 10 Pro XL, with prices starting at $799.
60 Sources
Technology
12 hrs ago
60 Sources
Technology
12 hrs ago
Google launches its new Pixel 10 smartphone series, showcasing advanced AI capabilities powered by Gemini, aiming to compete with Apple in the premium handset market.
22 Sources
Technology
12 hrs ago
22 Sources
Technology
12 hrs ago
NASA and IBM have developed Surya, an open-source AI model that can predict solar flares and space weather with improved accuracy, potentially helping to protect Earth's infrastructure from solar storm damage.
6 Sources
Technology
20 hrs ago
6 Sources
Technology
20 hrs ago
Google's latest smartwatch, the Pixel Watch 4, introduces significant upgrades including a curved display, AI-powered features, and satellite communication capabilities, positioning it as a strong competitor in the smartwatch market.
18 Sources
Technology
12 hrs ago
18 Sources
Technology
12 hrs ago
FieldAI, a robotics startup, has raised $405 million to develop "foundational embodied AI models" for various robot types. The company's innovative approach integrates physics principles into AI, enabling safer and more adaptable robot operations across diverse environments.
7 Sources
Technology
12 hrs ago
7 Sources
Technology
12 hrs ago