3 Sources
[1]
A project to bring CUDA to non-Nvidia GPUs is making major progress -- ZLUDA update now has two full-time developers, working on 32-bit PhysX support and LLMs, amongst other things
ZLUDA, a CUDA translation layer that almost closed down last year, but got saved by an unknown party, this week shared an update about its steady technical progress and team expansion over the last quarter, reports Phoronix. The project continues to build out its capabilities to run CUDA workloads on non-Nvidia GPUs; for now, it is more focused on AI rather than on other things. Yet, work has also begun on enabling 32-bit PhysX support, which is required for compatibility with older CUDA-based games. Perhaps, the most important thing for the ZLUDA project is that its development team has grown from one to two full-time developers working on the project. The second developer, Violet, joined less than a month ago and has already delivered important improvements, particularly in advancing support for large language model (LLM) workloads through the llm.c project, according to the update. A community contributor named @Groowy began the initial work to enable 32-bit PhysX support in ZLUDA by collecting detailed CUDA logs, which quickly revealed several bugs. Since some of these problems could also impact 64-bit CUDA functionality, fixing them was added to the official roadmap. However, completing full 32-bit PhysX support will still rely on further help from open-source contributors. The ZLUDA developers are working on a test project called llm.c, which is a small example program that tries to run a GPT-2 model using CUDA. Even though this test is not huge, it is important because it is the first time ZLUDA has tried to handle both normal CUDA functions and special libraries like cuBLAS (fast math operations). This test program makes 8,186 separate calls to CUDA functions, spread over 44 different APIs. In the beginning, ZLUDA would crash right away on the very first call. Thanks to many updates contributed by Violet, it can now get all the way to the 552nd call before it fails. The team has already completed support for 16 of the 44 needed functions, so they are getting closer to running the whole test successfully. Once this works, it will help ZLUDA support bigger software like PyTorch in the future. ZLUDA's core objective is to run standard CUDA programs on non-Nvidia GPUs while matching the behavior of Nvidia hardware as precisely as possible. This means each instruction must either deliver identical results down to the last bit or stay within strict numerical tolerances compared to Nvidia hardware. Earlier versions of ZLUDA, before the major code reset, often compromised on accuracy by skipping certain instruction modifiers or failing to maintain full precision. The current implementation has made substantial progress in fixing this. To ensure accuracy, it runs PTX 'sweep' tests -- systematic checks using Nvidia's intermediate GPU language -- to confirm that every instruction and modifier combination produces correct results across all inputs, something that has never been used before. Running these checks revealed several compiler defects, which were addressed later. ZLUDA admits that not every instruction has completed this rigorous validation yet, but stressed that some of the most complex cases -- such as the cvt instruction -- are now confirmed bit-accurate. The foundation for getting any CUDA-based software to work on ZLUDA -- whether it is a game, a 3D application, or an ML framework -- is having logs of how the program communicates with CUDA, which includes tracking both direct API calls, undocumented parts of the CUDA runtime (or drivers), and any use of specialized performance libraries. With the recent update, ZLUDA's logging system has been significantly upgraded. The new implementation captures a wider range of activity that was not visible before, including detailed traces of internal behavior, such as when cuBLAS relies on cuBLASLt or how cuDNN interacts with the lower-level Driver API. Modern GPU frameworks like CUDA, ROCm/HIP, ZLUDA, and OpenCL all need to compile device code dynamically while applications run to ensure that older GPU programs can still be built and executed correctly on newer hardware generations without changes to the original code. In AMD's ROCm/HIP ecosystem, this on-the-fly compilation depends on the comgr library (short for ROCm-CompilerSupport), a compact library with extensive capabilities to handle tasks like compiling, linking, and disassembling code, available on both Linux and Windows. With ROCm/HIP version 6.4, a significant application binary interface (ABI) change occurred: the numeric codes representing actions were rearranged in a new v3 ABI. This caused ZLUDA to accidentally call the wrong operations -- for example, attempting to link instead of compile, which led to errors. The situation was worse on Windows, where the library claimed to be version 2.9 but internally used the v3 ABI, mixing behaviors. These problems were also addressed recently by the ZLUDA team.
[2]
Open source project is making strides in bringing CUDA to non-Nvidia GPUs
Serving tech enthusiasts for over 25 years. TechSpot means tech analysis and advice you can trust. Why it matters: Nvidia introduced CUDA in 2006 as a proprietary API and software layer that eventually became the key to unlocking the immense parallel computing power of GPUs. CUDA plays a major role in fields such as artificial intelligence, scientific computing, and high-performance simulations. But running CUDA code has remained largely locked to Nvidia hardware. Now, an open-source project is working to break that barrier. By enabling CUDA applications to run on third-party GPUs from AMD, Intel, and others, this effort could dramatically expand hardware choice, reduce vendor lock-in, and make powerful GPU computing more accessible than ever. The Zluda team recently shared its latest quarterly update, confirming that the project remains focused on fully implementing CUDA compatibility on non-Nvidia graphics accelerators. Zluda's stated goal is to offer a drop-in replacement for CUDA on AMD, Intel, and other GPU architectures - allowing users and developers to run unmodified CUDA-based applications with "near-native" performance. A most promising change for Zluda is that its team has doubled in size. There are now two full-time developers working on the project. The newly added developer, known as "Violet," has already made notable contributions to the tool's official open-source repository on GitHub. Other important updates involve improvements to the ROCm/HIP GPU runtime, which should now function reliably on both Linux and Windows. GPU runtimes like CUDA and ROCm are designed to compile GPU code at runtime, ensuring that code developed for older hardware can typically compile and run on newer GPU architectures with minimal issues. Zluda is also now significantly better at executing unmodified CUDA binaries on non-Nvidia GPUs. Previously, the tool either ignored certain instruction modifiers or failed to execute them with full precision. Now, the improved code can handle some of the trickiest cases - such as the cvt instruction - with bit-accurate precision. A key step in fully supporting CUDA applications is tracking how code interacts with the API through detailed logging. Zluda has improved in this area as well. It can now capture previously overlooked interactions and even handle intermediate API calls. Also see: Not just the hardware: How deep is Nvidia's software moat? The developers also made meaningful progress in supporting llm.c, a pure CUDA test implementation (written in C) for language models like GPT-2 and GPT-3. Zluda currently implements 16 out of 44 functions in llm.c, and the team hopes to fully run the test soon. Finally, Zluda has advanced slightly in its potential support for 32-bit PhysX code. Nvidia dropped both hardware and software support for this middleware with the Blackwell-based GeForce 50 series GPUs, leaving fans of old(ish) games with what can be essentially described as a broken or subpar experience. In the past quarter, Zluda received a minor update related to 32-bit PhysX support. The initial focus is on efficiently collecting CUDA logs to identify potential bugs, which can eventually affect 64-bit PhysX code as well. However, the developers caution that full 32-bit PhysX support will likely require significant contributions from third-party coders.
[3]
Open-Source Library ZLUDA Sees Major Progress in Bringing NVIDIA's CUDA Code to Other GPUs; Doubles Developer Count
ZLUDA has made massive headlines in the past with their "code porting" library, and while enablement did drop the past few months, it looks like the developers are geared up once again. For those unaware, the ZLUDA library made headlines last year, and it was initially designed to support Intel GPUs on NVIDIA's software stack, but eventually, AMD took care of the project and, together with multiple developers, molded it in a way that allowed them to break boundaries and access NVIDIA's CUDA onto their own AI hardware, which was seen as a massive breakthrough for the open-source community. However, AMD decided to scrap the project due to legal concerns, but ZLUDA is back, this time with a bang. In a report by Phoronix, it is revealed that ZLUDA is developing a "multi-vendor" solution for users looking to port NVIDIA's CUDA code in order to run on GPUs from other manufacturers, and in particular, during the past few months, the developers are more active towards taking ZLUDA to the next stage. There are now two individuals working on the project, which means faster development and deployment, and ultimately, a chance to make NVIDIA's CUDA a more universal platform. Apart from this, ZLUDA has made several optimizations to its tech stack, bringing in bit-accurate execution across GPUs, and progress on NVIDIA's PhysX support. Well, for now, it seems like taking ZLUDA to the stage where it is actually applicable will require a lot of time, and there isn't any defined timeline on when we could expect the library to go live, but the project is indeed an optimistic one, and we will definitely be looking forward to it. If the project proves to be a success, we might see the exclusivity boundaries present in AI software stacks break, allowing architectures to leverage each other's capabilities for an optimal end result. NVIDIA has made CUDA almost "inaccessible" to other users, and AMD has now shifted focus towards its ROCm stack, hence ZLUDA can act as a bridge between them, if it goes live.
Share
Copy Link
The open-source ZLUDA project is making significant progress in enabling CUDA compatibility on non-NVIDIA GPUs, potentially expanding hardware choices for AI and scientific computing.
The ZLUDA project, an open-source initiative aimed at enabling NVIDIA's CUDA to run on non-NVIDIA GPUs, has reported significant progress in its latest update 1. The project, which nearly shut down last year but was saved by an unknown party, has expanded its development team and made substantial technical advancements 12.
ZLUDA's development team has doubled in size, now comprising two full-time developers 12. The new developer, Violet, has already made notable contributions, particularly in advancing support for large language model (LLM) workloads through the llm.c project 1. The team's current focus is more on AI applications rather than other areas, although work has begun on enabling 32-bit PhysX support 13.
Source: Tom's Hardware
The developers are working on a test project called llm.c, a small program that attempts to run a GPT-2 model using CUDA 1. This test involves 8,186 separate calls to CUDA functions across 44 different APIs. ZLUDA has already completed support for 16 of the 44 needed functions, marking significant progress towards running the entire test successfully 1.
ZLUDA has made substantial progress in ensuring bit-accurate execution of CUDA instructions on non-NVIDIA GPUs 12. The team has implemented PTX 'sweep' tests to confirm that every instruction and modifier combination produces correct results across all inputs 1.
The project has significantly upgraded its logging system, capturing a wider range of activity that was previously invisible 1. This includes detailed traces of internal behavior, such as interactions between cuBLAS, cuBLASLt, and cuDNN with the lower-level Driver API 1.
Source: TechSpot
ZLUDA has addressed issues related to the ROCm/HIP ecosystem, particularly concerning the comgr library and recent ABI changes 1. These improvements ensure better compatibility on both Linux and Windows platforms 12.
Source: Wccftech
By enabling CUDA applications to run on third-party GPUs from AMD, Intel, and others, ZLUDA could dramatically expand hardware choices, reduce vendor lock-in, and make powerful GPU computing more accessible 2. This effort has the potential to break down exclusivity boundaries in AI software stacks, allowing different architectures to leverage each other's capabilities 3.
While ZLUDA has made significant progress, the developers caution that full 32-bit PhysX support will likely require substantial contributions from third-party coders 12. The project's timeline for full implementation remains undefined, but the recent advancements and increased development capacity suggest a promising future for CUDA compatibility across diverse GPU architectures 23.
Summarized by
Navi
Google rolls out an AI-powered business calling feature in Search and upgrades AI Mode with Gemini 2.5 Pro and Deep Search capabilities, showcasing significant advancements in AI integration for everyday tasks.
11 Sources
Technology
10 hrs ago
11 Sources
Technology
10 hrs ago
Calvin French-Owen, a former OpenAI engineer, shares insights into the company's intense work environment, rapid growth, and secretive culture, highlighting both challenges and achievements in AI development.
4 Sources
Technology
10 hrs ago
4 Sources
Technology
10 hrs ago
Microsoft's AI assistant Copilot lags behind ChatGPT in downloads and user adoption, despite the company's significant investment in AI technology and infrastructure.
4 Sources
Technology
10 hrs ago
4 Sources
Technology
10 hrs ago
Larry Ellison, Oracle's co-founder, surpasses Mark Zuckerberg to become the world's second-richest person with a net worth of $251 billion, driven by Oracle's AI-fueled stock rally and strategic partnerships.
4 Sources
Business and Economy
19 hrs ago
4 Sources
Business and Economy
19 hrs ago
OpenAI has added Google Cloud to its list of cloud partners, joining Microsoft, Oracle, and CoreWeave, as the AI giant seeks to meet escalating demands for computing capacity to power its AI models like ChatGPT.
5 Sources
Technology
3 hrs ago
5 Sources
Technology
3 hrs ago