API

Kernel language

KernelAbstractions.@kernel — Macro

@kernel function f(args) end

Takes a function definition and generates a Kernel constructor from it. The enclosed function is allowed to contain kernel language constructs. In order to call it the kernel has first to be specialized on the backend and then invoked on the arguments.

Kernel language

@Const
@index
@groupsize
@ndrange
@localmem
@private
@uniform
@synchronize
@print

Example:

@kernel function vecadd(A, @Const(B))
    I = @index(Global)
    @inbounds A[I] += B[I]
end

A = ones(1024)
B = rand(1024)
vecadd(CPU(), 64)(A, B, ndrange=size(A))
synchronize(backend)

@kernel config function f(args) end

This allows for two different configurations:

cpu={true, false}: Disables code-generation of the CPU function. This relaxes semantics such that KernelAbstractions primitives can be used in non-kernel functions.
inbounds={false, true}: Enables a forced @inbounds macro around the function definition in the case the user is using too many @inbounds already in their kernel. Note that this can lead to incorrect results, crashes, etc and is fundamentally unsafe. Be careful!
unsafe_indices={false, true}: Disables the implicit validation of indices, users must avoid @index(Global).

@context

Warn

This is an experimental feature.

Note

cpu={true, false} is deprecated for KernelAbstractions 1.0

KernelAbstractions.@Const — Macro

@Const(A)

@Const is an argument annotiation that asserts that the memory reference by A is both not written to as part of the kernel and that it does not alias any other memory in the kernel.

Danger

Violating those constraints will lead to arbitrary behaviour.

As an example given a kernel signature kernel(A, @Const(B)), you are not allowed to call the kernel with kernel(A, A) or kernel(A, view(A, :)).

KernelAbstractions.@index — Macro

@index

The @index macro can be used to give you the index of a workitem within a kernel function. It supports both the production of a linear index or a cartesian index. A cartesian index is a general N-dimensional index that is derived from the iteration space.

Index granularity

Global: Used to access global memory.
Group: The index of the workgroup.
Local: The within workgroup index.

Index kind

Linear: Produces an Int64 that can be used to linearly index into memory.
Cartesian: Produces a CartesianIndex{N} that can be used to index into memory.
NTuple: Produces a NTuple{N} that can be used to index into memory.

If the index kind is not provided it defaults to Linear, this is subject to change.

Examples

@index(Global, Linear)
@index(Global, Cartesian)
@index(Local, Cartesian)
@index(Group, Linear)
@index(Local, NTuple)
@index(Global)

KernelAbstractions.@localmem — Macro

@localmem T dims

Declare storage that is local to a workgroup.

KernelAbstractions.@private — Macro

@private T dims

Declare storage that is local to each item in the workgroup. This can be safely used across @synchronize statements. On a CPU, this will allocate additional implicit dimensions to ensure correct localization.

For storage that only persists between @synchronize statements, an MArray can be used instead.

See also @uniform.

Note

@private is deprecated for KernelAbstractions 1.0

@private mem = 1

Creates a private local of mem per item in the workgroup. This can be safely used across @synchronize statements.

Note

@private is deprecated for KernelAbstractions 1.0

KernelAbstractions.@synchronize — Macro

@synchronize()

After a @synchronize statement all read and writes to global and local memory from each thread in the workgroup are visible in from all other threads in the workgroup.

Note

@synchronize() must be encountered by all workitems of a work-group executing the kernel or by none at all.

@synchronize(cond)

After a @synchronize statement all read and writes to global and local memory from each thread in the workgroup are visible in from all other threads in the workgroup. cond is not allowed to have any visible sideffects.

Platform differences

GPU: This synchronization will only occur if the cond evaluates.
CPU: This synchronization will always occur.

Warn

This variant of the @synchronize macro violates the requirement that @synchronize must be encountered by all workitems of a work-group executing the kernel or by none at all. Since v0.9.34 this version of the macro is deprecated and lowers to @synchronize()

KernelAbstractions.@print — Macro

@print(items...)

This is a unified print statement.

Platform differences

GPU: This will reorganize the items to print via @cuprintf
CPU: This will call print(items...)

KernelAbstractions.@uniform — Macro

@uniform expr

expr is evaluated outside the workitem scope. This is useful for variable declarations that span workitems, or are reused across @synchronize statements.

Note

@uniform is deprecated for KernelAbstractions 1.0

KernelAbstractions.@groupsize — Macro

@groupsize()

Query the workgroupsize on the backend. This function returns a tuple corresponding to kernel configuration. In order to get the total size you can use prod(@groupsize()).

KernelAbstractions.@ndrange — Macro

@ndrange()

Query the ndrange on the backend. This function returns a tuple corresponding to kernel configuration.

KernelAbstractions.synchronize — Function

synchronize(::Backend)

Synchronize the current backend.

Note

Backend implementations must implement this function.

KernelAbstractions.allocate — Function

allocate(::Backend, Type, dims...)::AbstractArray

Allocate a storage array appropriate for the computational backend.

Note

Backend implementations must implement allocate(::NewBackend, T, dims::Tuple)

Host language

KernelAbstractions.zeros — Function

zeros(::Backend, Type, dims...)::AbstractArray

Allocate a storage array appropriate for the computational backend filled with zeros.

Internal

KernelAbstractions.Kernel — Type

Kernel{Backend, WorkgroupSize, NDRange, Func}

Kernel closure struct that is used to represent the backend kernel on the host. WorkgroupSize is the number of workitems in a workgroup.

Note

Backend implementations must implement:

(kernel::Kernel{<:NewBackend})(args...; ndrange=nothing, workgroupsize=nothing)

As well as the on-device functionality.

KernelAbstractions.partition — Function

Partition a kernel for the given ndrange and workgroupsize.

KernelAbstractions.@context — Macro

@context()

Access the hidden context object used by KernelAbstractions.

Warn

Only valid to be used from a kernel with cpu=false.

Note

@context will be supported on all backends in KernelAbstractions 1.0

function f(@context, a)
    I = @index(Global, Linear)
    a[I]
end

@kernel cpu=false function my_kernel(a)
    f(@context, a)
end