Using Different Backends

For any of the examples here, simply use a different GPU array and AcceleratedKernels.jl will pick the right backend:

# Intel Graphics
using oneAPI
v = oneArray{Int32}(undef, 100_000)             # Empty array

# AMD ROCm
using AMDGPU
v = ROCArray{Float64}(1:100_000)                # A range converted to Float64

# Apple Metal
using Metal
v = MtlArray(rand(Float32, 100_000))            # Transfer from host to device

# NVidia CUDA
using CUDA
v = CuArray{UInt32}(0:5:100_000)                # Range with explicit step size

# Transfer GPU array back
v_host = Array(v)

All publicly-exposed functions have CPU implementations with unified parameter interfaces:

import AcceleratedKernels as AK
v = Vector(-1000:1000)                          # Normal CPU array
AK.reduce(+, v, max_tasks=Threads.nthreads())

By default all algorithms use the number of threads Julia was started with.