Programming AMD GPUs with Julia

Julia support for programming AMD GPUs is currently provided by the AMDGPU.jl package. This package contains everything necessary to program for AMD GPUs in Julia, including:

  • An interface for compiling and running kernels written in Julia through LLVM's AMDGPU backend.
  • An interface for working with the HIP runtime API, necessary for launching compiled kernels and controlling the GPU.
  • An array type implementing the GPUArrays.jl interface, providing high-level array operations.

Installation

Simply add the AMDGPU.jl package to your Julia environment:

using Pkg
Pkg.add("AMDGPU")

To ensure that everything works, you can run the test suite:

using AMDGPU
using Pkg
Pkg.test("AMDGPU")

Requirements

  • Julia 1.9 or higher (Navi 3 requires Julia 1.10+).
  • 64-bit Linux or Windows.
  • Minimal supported ROCm version is 5.3.
  • Required software:
LinuxWindows
ROCmROCm
-AMD Software: Adrenalin Edition

On Windows AMD Software: Adrenalin Edition contains HIP library itself, while ROCm provides support for other functionality.

Windows OS missing functionality

Windows does not yet support Hostcall, which means that some of the functionality does not work, like:

  • device printing;
  • dynamic memory allocation (from kernels).

These hostcalls are sometimes launched when AMDGPU detects that a kernel might throw an exception, specifically during conversions, like: Int32(1f0).

To avoid this, use 'unsafe' conversion option: unsafe_trunc(Int32, 1f0).

ROCm system libraries

AMDGPU.jl looks into standard directories and uses Libdl.find_library to find ROCm libraries.

Standard path:

  • Linux: /opt/rocm
  • Windows: C:/Program Files/AMD/ROCm/<rocm-version>

If you have non-standard path for ROCm, set ROCM_PATH=<path> environment variable before launching Julia.

ROCm artifacts

There is limited support for ROCm 5.4+ artifacts which can be enabled with AMDGPU.use_artifacts!.

Limited means not all libraries are available and some of the functionality may be disabled.

AMDGPU.ROCmDiscovery.use_artifacts!Function
use_artifacts!(flag::Bool = true)

Pass true to switch from system-wide ROCm installtion to artifacts. When using artifacts, system-wide installation is not needed at all.

source

Extra Setup Details

List of additional steps that may be needed to take to ensure everything is working:

  • Make sure your user is in the same group as /dev/kfd, other than root.

    For example, it might be the render group:

    crw-rw---- 1 root render 234, 0 Aug 5 11:43 kfd

    In this case, you can add yourself to it:

    sudo usermod -aG render username

  • ROCm libraries should be in the standard library locations, or in your LD_LIBRARY_PATH.

  • If you get an error message along the lines of GLIB_CXX_... not found, it's possible that the C++ runtime used to build the ROCm stack and the one used by Julia are different. If you built the ROCm stack yourself this is very likely the case since Julia normally ships with its own C++ runtime.

    For more information, check out this GitHub issue. A quick fix is to use the LD_PRELOAD environment variable to make Julia use the system C++ runtime library, for example:

    LD_PRELOAD=/usr/lib/libstdc++.so julia

    Alternatively, you can build Julia from source as described here. To quickly debug this issue start Julia and try to load a ROCm library:

    using Libdl Libdl.dlopen("/opt/rocm/hsa/lib/libhsa-runtime64.so.1")

Once all of this is setup properly, you should be able to do using AMDGPU successfully.

See the Quick Start documentation for an introduction to using AMDGPU.jl.

Preferences

AMDGPU.jl supports setting preferences. Template of LocalPreferences.toml with all options:

[AMDGPU]
# If `true` then use ROCm libraries provided by artifacts.
# However, not all ROCm libraries are available as artifacts.
use_artifacts = false
# Use non-blocking synchronization for all `AMDGPU.synchronize()` calls.
nonblocking_synchronization = true
# Memory limit specifies maximum amount of memory in percentages
# a current Julia process can use.
# Default is "none", which does not apply any limitation.
hard_memory_limit = "none"
# Notice a space between the value and percentage sign.
# hard_memory_limit = "80 %"