Kernel Programming
For maximum performance or custom operations not covered by high-level array abstractions, you can write custom kernels in Julia that execute on the GPU.
The @oneapi Macro
The @oneapi macro is used to launch a kernel on the device. It takes configuration arguments like the number of items (threads) and groups (blocks).
using oneAPI
function kernel(a, b)
i = get_global_id()
if i <= length(a)
@inbounds a[i] += b[i]
end
return
end
a = oneArray(rand(Float32, 100))
b = oneArray(rand(Float32, 100))
# Launch configuration
items = 100
groups = 1
@oneapi items=items groups=groups kernel(a, b)KernelAbstractions.jl
For portable kernel programming, it is highly recommended to use KernelAbstractions.jl. This allows you to write kernels that work on CPU, CUDA, ROCm, and oneAPI.
using KernelAbstractions, oneAPI
@kernel function my_kernel!(a, b)
i = @index(Global, Linear)
@inbounds a[i] += b[i]
end
# Get the backend
backend = get_backend(a)
# Instantiate the kernel
k = my_kernel!(backend)
# Launch with configuration
k(a, b; ndrange=length(a))Device Intrinsics
Inside a kernel, you can use various intrinsics to interact with the hardware:
get_global_id(): Get the global thread ID.get_local_id(): Get the local thread ID within a workgroup.get_group_id(): Get the workgroup ID.barrier(): Synchronize threads within a workgroup.
These correspond to standard OpenCL/Level Zero intrinsics.