Quick start

First you have to write the kernel function and make sure it only uses features from the CUDA-supported subset of Julia:

using CUDAnative

function kernel_vadd(a, b, c)
    i = (blockIdx().x-1) * blockDim().x + threadIdx().x
    c[i] = a[i] + b[i]

    return nothing

Using the @cuda macro, you can launch the kernel on a GPU of your choice:

using CUDAdrv, CUDAnative
using Base.Test

# CUDAdrv functionality: generate and upload data
a = round.(rand(Float32, (3, 4)) * 100)
b = round.(rand(Float32, (3, 4)) * 100)
d_a = CuArray(a)
d_b = CuArray(b)
d_c = similar(d_a)  # output array

# run the kernel and fetch results
# syntax: @cuda (dims...) kernel(args...)
@cuda (1,12) kernel_vadd(d_a, d_b, d_c)

# CUDAdrv functionality: download data
# this synchronizes the device
c = Array(d_c)

@test a+b ≈ c

This code is executed in a default, global context for the first device in your system. The compiler queries the context through CuCurrentContext, which implies you can easily switch contexts (using a different device, or supplying different flags) by activating a different one:

dev = CuDevice(0)
CuContext(dev) do ctx
    # allocate things in this context
    @cuda ...

Julia support

Only a limited subset of Julia is supported by this package. This subset is undocumented, as it is too much in flux.

In general, GPU support of Julia code is determined by the language features used by the code. Several parts of the language are downright disallowed, such as calls to the Julia runtime, or garbage allocations. Other features might get reduced in strength, eg. throwing exceptions will result in a trap.

If your code is incompatible with GPU execution, the compiler will mention the unsupported feature, and where the use came from:

julia> foo(i) = (print("can't do this"); return nothing)
foo (generic function with 1 method)

julia> @cuda (1,1) foo(1)
ERROR: error compiling foo: error compiling print: generic call to unsafe_write requires the runtime language feature

In addition, the JIT doesn't support certain modes of compilation. For example, recursive functions require a proper cached compilation, which is currently absent.

CUDA support

Not all of CUDA is supported, and because of time constraints the supported subset is again undocumented. The following (incomplete) list details the support and their CUDAnative.jl names. Most are implemented in intrinsics.jl, so have a look at that file for a more up to date list:


In addition to the native intrinsics listed above, math functionality from libdevice is wrapped and part of CUDAnative. For now, you need to fully qualify function calls to these intrinsics, which provide similar functionality to some of the low-level math functionality of Base which would otherwise call out to libm.