API
Kernel language
KernelAbstractions.@kernel — Macro@kernel function f(args) endTakes a function definition and generates a Kernel constructor from it. The enclosed function is allowed to contain kernel language constructs. In order to call it the kernel has first to be specialized on the backend and then invoked on the arguments.
Kernel language
Example:
@kernel function vecadd(A, @Const(B))
I = @index(Global)
@inbounds A[I] += B[I]
end
A = ones(1024)
B = rand(1024)
vecadd(CPU(), 64)(A, B, ndrange=size(A))
synchronize(backend)@kernel config function f(args) endThis allows for two different configurations:
cpu={true, false}: Disables code-generation of the CPU function. This relaxes semantics such that KernelAbstractions primitives can be used in non-kernel functions.inbounds={false, true}: Enables a forced@inboundsmacro around the function definition in the case the user is using too many@inboundsalready in their kernel. Note that this can lead to incorrect results, crashes, etc and is fundamentally unsafe. Be careful!unsafe_indices={false, true}: Disables the implicit validation of indices, users must avoid@index(Global).
KernelAbstractions.@Const — Macro@Const(A)@Const is an argument annotiation that asserts that the memory reference by A is both not written to as part of the kernel and that it does not alias any other memory in the kernel.
KernelAbstractions.@index — Macro@indexThe @index macro can be used to give you the index of a workitem within a kernel function. It supports both the production of a linear index or a cartesian index. A cartesian index is a general N-dimensional index that is derived from the iteration space.
Index granularity
Global: Used to access global memory.Group: The index of theworkgroup.Local: The withinworkgroupindex.
Index kind
Linear: Produces anInt64that can be used to linearly index into memory.Cartesian: Produces aCartesianIndex{N}that can be used to index into memory.NTuple: Produces aNTuple{N}that can be used to index into memory.
If the index kind is not provided it defaults to Linear, this is subject to change.
Examples
@index(Global, Linear)
@index(Global, Cartesian)
@index(Local, Cartesian)
@index(Group, Linear)
@index(Local, NTuple)
@index(Global)KernelAbstractions.@localmem — Macro@localmem T dimsDeclare storage that is local to a workgroup.
KernelAbstractions.@private — Macro@private T dimsDeclare storage that is local to each item in the workgroup. This can be safely used across @synchronize statements. On a CPU, this will allocate additional implicit dimensions to ensure correct localization.
For storage that only persists between @synchronize statements, an MArray can be used instead.
See also @uniform.
@private mem = 1Creates a private local of mem per item in the workgroup. This can be safely used across @synchronize statements.
KernelAbstractions.@synchronize — Macro@synchronize()After a @synchronize statement all read and writes to global and local memory from each thread in the workgroup are visible in from all other threads in the workgroup.
@synchronize(cond)After a @synchronize statement all read and writes to global and local memory from each thread in the workgroup are visible in from all other threads in the workgroup. cond is not allowed to have any visible sideffects.
Platform differences
GPU: This synchronization will only occur if thecondevaluates.CPU: This synchronization will always occur.
KernelAbstractions.@print — Macro@print(items...)This is a unified print statement.
Platform differences
GPU: This will reorganize the items to print via@cuprintfCPU: This will callprint(items...)
KernelAbstractions.@uniform — Macro@uniform exprexpr is evaluated outside the workitem scope. This is useful for variable declarations that span workitems, or are reused across @synchronize statements.
KernelAbstractions.@groupsize — Macro@groupsize()Query the workgroupsize on the backend. This function returns a tuple corresponding to kernel configuration. In order to get the total size you can use prod(@groupsize()).
KernelAbstractions.@ndrange — Macro@ndrange()Query the ndrange on the backend. This function returns a tuple corresponding to kernel configuration.
KernelAbstractions.synchronize — Functionsynchronize(::Backend)Synchronize the current backend.
KernelAbstractions.allocate — Functionallocate(::Backend, Type, dims...; unified=false)::AbstractArrayAllocate a storage array appropriate for the computational backend. unified=true allocates an array using unified memory if the backend supports it and throws otherwise. Use supports_unified to determine whether it is supported by a backend.
Host language
KernelAbstractions.zeros — Functionzeros(::Backend, Type, dims...; unified=false)::AbstractArrayAllocate a storage array appropriate for the computational backend filled with zeros. unified=true allocates an array using unified memory if the backend supports it and throws otherwise.
KernelAbstractions.supports_unified — Functionsupports_unified(::Backend)::BoolReturns whether unified memory arrays are supported by the backend.
Internal
KernelAbstractions.Kernel — TypeKernel{Backend, WorkgroupSize, NDRange, Func}Kernel closure struct that is used to represent the backend kernel on the host. WorkgroupSize is the number of workitems in a workgroup.
KernelAbstractions.partition — FunctionPartition a kernel for the given ndrange and workgroupsize.
KernelAbstractions.@context — Macro@context()Access the hidden context object used by KernelAbstractions.
function f(@context, a)
I = @index(Global, Linear)
a[I]
end
@kernel cpu=false function my_kernel(a)
f(@context, a)
end