API

Kernel language

KernelAbstractions.@kernelMacro
@kernel function f(args) end

Takes a function definition and generates a Kernel constructor from it. The enclosed function is allowed to contain kernel language constructs. In order to call it the kernel has first to be specialized on the backend and then invoked on the arguments.

Kernel language

Example:

@kernel function vecadd(A, @Const(B))
    I = @index(Global)
    @inbounds A[I] += B[I]
end

A = ones(1024)
B = rand(1024)
vecadd(CPU(), 64)(A, B, ndrange=size(A))
synchronize(backend)
source
@kernel config function f(args) end

This allows for two different configurations:

  1. cpu={true, false}: Disables code-generation of the CPU function. This relaxes semantics such that KernelAbstractions primitives can be used in non-kernel functions.
  2. inbounds={false, true}: Enables a forced @inbounds macro around the function definition in the case the user is using too many @inbounds already in their kernel. Note that this can lead to incorrect results, crashes, etc and is fundamentally unsafe. Be careful!
Warn

This is an experimental feature.

source
KernelAbstractions.@ConstMacro
@Const(A)

@Const is an argument annotiation that asserts that the memory reference by A is both not written to as part of the kernel and that it does not alias any other memory in the kernel.

Danger

Violating those constraints will lead to arbitrary behaviour.

As an example given a kernel signature kernel(A, @Const(B)), you are not allowed to call the kernel with kernel(A, A) or kernel(A, view(A, :)).

source
KernelAbstractions.@indexMacro
@index

The @index macro can be used to give you the index of a workitem within a kernel function. It supports both the production of a linear index or a cartesian index. A cartesian index is a general N-dimensional index that is derived from the iteration space.

Index granularity

  • Global: Used to access global memory.
  • Group: The index of the workgroup.
  • Local: The within workgroup index.

Index kind

  • Linear: Produces an Int64 that can be used to linearly index into memory.
  • Cartesian: Produces a CartesianIndex{N} that can be used to index into memory.
  • NTuple: Produces a NTuple{N} that can be used to index into memory.

If the index kind is not provided it defaults to Linear, this is subject to change.

Examples

@index(Global, Linear)
@index(Global, Cartesian)
@index(Local, Cartesian)
@index(Group, Linear)
@index(Local, NTuple)
@index(Global)
source
KernelAbstractions.@privateMacro
@private T dims

Declare storage that is local to each item in the workgroup. This can be safely used across @synchronize statements. On a CPU, this will allocate additional implicit dimensions to ensure correct localization.

For storage that only persists between @synchronize statements, an MArray can be used instead.

See also @uniform.

source
@private mem = 1

Creates a private local of mem per item in the workgroup. This can be safely used across @synchronize statements.

source
KernelAbstractions.@synchronizeMacro
@synchronize()

After a @synchronize statement all read and writes to global and local memory from each thread in the workgroup are visible in from all other threads in the workgroup.

source
@synchronize(cond)

After a @synchronize statement all read and writes to global and local memory from each thread in the workgroup are visible in from all other threads in the workgroup. cond is not allowed to have any visible sideffects.

Platform differences

  • GPU: This synchronization will only occur if the cond evaluates.
  • CPU: This synchronization will always occur.
source
KernelAbstractions.@printMacro
@print(items...)

This is a unified print statement.

Platform differences

  • GPU: This will reorganize the items to print via @cuprintf
  • CPU: This will call print(items...)
source
KernelAbstractions.@uniformMacro
@uniform expr

expr is evaluated outside the workitem scope. This is useful for variable declarations that span workitems, or are reused across @synchronize statements.

source
KernelAbstractions.@groupsizeMacro
@groupsize()

Query the workgroupsize on the backend. This function returns a tuple corresponding to kernel configuration. In order to get the total size you can use prod(@groupsize()).

source
KernelAbstractions.allocateFunction
allocate(::Backend, Type, dims...)::AbstractArray

Allocate a storage array appropriate for the computational backend.

Note

Backend implementations must implement allocate(::NewBackend, T, dims::Tuple)

source

Host language

KernelAbstractions.zerosFunction
zeros(::Backend, Type, dims...)::AbstractArray

Allocate a storage array appropriate for the computational backend filled with zeros.

source

Internal

KernelAbstractions.KernelType
Kernel{Backend, WorkgroupSize, NDRange, Func}

Kernel closure struct that is used to represent the backend kernel on the host. WorkgroupSize is the number of workitems in a workgroup.

Note

Backend implementations must implement:

(kernel::Kernel{<:NewBackend})(args...; ndrange=nothing, workgroupsize=nothing)

As well as the on-device functionality.

source
KernelAbstractions.@contextMacro
@context()

Access the hidden context object used by KernelAbstractions.

Warn

Only valid to be used from a kernel with cpu=false.

function f(@context, a)
    I = @index(Global, Linear)
    a[I]
end

@kernel cpu=false function my_kernel(a)
    f(@context, a)
end
source