AMD Quietly Funded A Drop-In CUDA Implementation Built On ROCm: It's Now Open-Source

Date: unknown

Location: www.phoronix.com

While there have been efforts by AMD over the years to make it easier to port codebases targeting NVIDIA's CUDA API to run atop HIP/ROCm, it still requires work on the part of developers. The tooling has improved such as with HIPIFY to help in auto-generating but it isn't any simple, instant, and guaranteed solution -- especially if striving for optimal performance. Over the past two years AMD has quietly been funding an effort though to bring binary compatibility so that many NVIDIA CUDA applications could run atop the AMD ROCm stack at the library level -- a drop-in replacement without the need to adapt source code. In practice for many real-world workloads, it's a solution for end-users to run CUDA-enabled software without any developer intervention. Here is more information on this "skunkworks" project that is now available as open-source along with some of my own testing and performance benchmarks of this CUDA implementation built for Radeon GPUs.

From several years ago you may recall ZLUDA that was for enabling CUDA support on Intel graphics. That open-source project aimed to provide a drop-in CUDA implementation on Intel graphics built atop Intel oneAPI Level Zero. ZLUDA was discontinued due to private reasons but it turns out that the developer behind that (and who was also employed by Intel at the time), Andrzej Janik, was contracted by AMD in 2022 to effectively adapt ZLUDA for use on AMD GPUs with HIP/ROCm. Prior to being contracted by AMD, Intel was considering ZLUDA development. However, they ultimately turned down the idea and did not provide funding for the project.

Andrzej Janik spent the past two years bringing ZLUDA to Radeon GPUs and it works: many CUDA software can run on HIP/ROCm without any modifications -- or other processes... Just run the binaries as you normally would while ensuring that the ZLUDA library replacements to CUDA are loaded. For reasons unknown to me, AMD decided this year to discontinue funding the effort and not release it as any software product. But the good news was that there was a clause in case of this eventuality: Janik could open-source the work if/when the contract ended.

Andrzej Janik reached out and provided access to the new ZLUDA implementation for AMD ROCm to allow me to test it out and benchmark it in advance of today's planned public announcement. I've been testing it out for a few days and it's been a positive experience: CUDA-enabled software indeed running atop ROCm and without any changes. Even proprietary renderers and the like working with this "CUDA on Radeon" implementation.

The ZLUDA implementation though isn't 100% fail-safe as NVIDIA OptiX support not being fully supported and some features such as software not using PTX assembly code isn't currently handled. But for the most part this implementation is surprisingly capable for being a single developer effort.

For those wondering about the open-source code, it's dual-licensed under either Apache 2.0 or MIT. Rust fans will be excited to know the Rust programming language is leveraged for this Radeon implementation.

NOTE: In my screenshots and for the past two years of development the exposed device name for Radeon GPUs via CUDA has just been "Graphics Device" rather than the actual AMD Radeon graphics adapter with ROCm. The reason for this has been due to CUDA benchmarks auto-reporting results and other software that may have automated telemetry, to avoid leaking the fact of Radeon GPU use under CUDA, it's been set to the generic "Graphics Device" string. I'm told as part of today's open-sourcing of this ZLUDA on Radeon code that the change will be in place to expose the actual Radeon graphics card string rather than the generic "Graphics Device" concealer.