API
Kernel language
KernelAbstractions.@kernel
— Macro@kernel function f(args) end
Takes a function definition and generates a Kernel
constructor from it. The enclosed function is allowed to contain kernel language constructs. In order to call it the kernel has first to be specialized on the backend and then invoked on the arguments.
Kernel language
Example:
@kernel function vecadd(A, @Const(B))
I = @index(Global)
@inbounds A[I] += B[I]
end
A = ones(1024)
B = rand(1024)
vecadd(CPU(), 64)(A, B, ndrange=size(A))
synchronize(backend)
@kernel config function f(args) end
This allows for two different configurations:
cpu={true, false}
: Disables code-generation of the CPU function. This relaxes semantics such that KernelAbstractions primitives can be used in non-kernel functions.inbounds={false, true}
: Enables a forced@inbounds
macro around the function definition in the case the user is using too many@inbounds
already in their kernel. Note that this can lead to incorrect results, crashes, etc and is fundamentally unsafe. Be careful!
This is an experimental feature.
KernelAbstractions.@Const
— Macro@Const(A)
@Const
is an argument annotiation that asserts that the memory reference by A
is both not written to as part of the kernel and that it does not alias any other memory in the kernel.
Violating those constraints will lead to arbitrary behaviour.
As an example given a kernel signature kernel(A, @Const(B))
, you are not allowed to call the kernel with kernel(A, A)
or kernel(A, view(A, :))
.
KernelAbstractions.@index
— Macro@index
The @index
macro can be used to give you the index of a workitem within a kernel function. It supports both the production of a linear index or a cartesian index. A cartesian index is a general N-dimensional index that is derived from the iteration space.
Index granularity
Global
: Used to access global memory.Group
: The index of theworkgroup
.Local
: The withinworkgroup
index.
Index kind
Linear
: Produces anInt64
that can be used to linearly index into memory.Cartesian
: Produces aCartesianIndex{N}
that can be used to index into memory.NTuple
: Produces aNTuple{N}
that can be used to index into memory.
If the index kind is not provided it defaults to Linear
, this is subject to change.
Examples
@index(Global, Linear)
@index(Global, Cartesian)
@index(Local, Cartesian)
@index(Group, Linear)
@index(Local, NTuple)
@index(Global)
KernelAbstractions.@localmem
— Macro@localmem T dims
Declare storage that is local to a workgroup.
KernelAbstractions.@private
— Macro@private T dims
Declare storage that is local to each item in the workgroup. This can be safely used across @synchronize
statements. On a CPU, this will allocate additional implicit dimensions to ensure correct localization.
For storage that only persists between @synchronize
statements, an MArray
can be used instead.
See also @uniform
.
@private mem = 1
Creates a private local of mem
per item in the workgroup. This can be safely used across @synchronize
statements.
KernelAbstractions.@synchronize
— Macro@synchronize()
After a @synchronize
statement all read and writes to global and local memory from each thread in the workgroup are visible in from all other threads in the workgroup.
@synchronize(cond)
After a @synchronize
statement all read and writes to global and local memory from each thread in the workgroup are visible in from all other threads in the workgroup. cond
is not allowed to have any visible sideffects.
Platform differences
GPU
: This synchronization will only occur if thecond
evaluates.CPU
: This synchronization will always occur.
KernelAbstractions.@print
— Macro@print(items...)
This is a unified print statement.
Platform differences
GPU
: This will reorganize the items to print via@cuprintf
CPU
: This will callprint(items...)
KernelAbstractions.@uniform
— Macro@uniform expr
expr
is evaluated outside the workitem scope. This is useful for variable declarations that span workitems, or are reused across @synchronize
statements.
KernelAbstractions.@groupsize
— Macro@groupsize()
Query the workgroupsize on the backend. This function returns a tuple corresponding to kernel configuration. In order to get the total size you can use prod(@groupsize())
.
KernelAbstractions.@ndrange
— Macro@ndrange()
Query the ndrange on the backend. This function returns a tuple corresponding to kernel configuration.
KernelAbstractions.synchronize
— Functionsynchronize(::Backend)
Synchronize the current backend.
Backend implementations must implement this function.
KernelAbstractions.allocate
— Functionallocate(::Backend, Type, dims...)::AbstractArray
Allocate a storage array appropriate for the computational backend.
Backend implementations must implement allocate(::NewBackend, T, dims::Tuple)
Host language
KernelAbstractions.zeros
— Functionzeros(::Backend, Type, dims...)::AbstractArray
Allocate a storage array appropriate for the computational backend filled with zeros.
Internal
KernelAbstractions.Kernel
— TypeKernel{Backend, WorkgroupSize, NDRange, Func}
Kernel closure struct that is used to represent the backend kernel on the host. WorkgroupSize
is the number of workitems in a workgroup.
Backend implementations must implement:
(kernel::Kernel{<:NewBackend})(args...; ndrange=nothing, workgroupsize=nothing)
As well as the on-device functionality.
KernelAbstractions.partition
— FunctionPartition a kernel for the given ndrange and workgroupsize.
KernelAbstractions.@context
— Macro@context()
Access the hidden context object used by KernelAbstractions.
Only valid to be used from a kernel with cpu=false
.
function f(@context, a)
I = @index(Global, Linear)
a[I]
end
@kernel cpu=false function my_kernel(a)
f(@context, a)
end