Interface
To extend the above functionality to a new array type, you should use the types and implement the interfaces listed on this page. GPUArrays is design around having two different array types to represent a GPU array: one that only ever lives on the host, and one that actually can be instantiated on the device (i.e. in kernels).
Device functionality
Several types and interfaces are related to the device and execution of code on it. First of all, you need to provide a type that represents your execution back-end and a way to call kernels:
Missing docstring for GPUArrays.AbstractGPUBackend
. Check Documenter's build log for details.
Missing docstring for GPUArrays.AbstractKernelContext
. Check Documenter's build log for details.
GPUArrays.gpu_call
— Functiongpu_call(kernel::Function, arg0, args...; kwargs...)
Executes kernel
on the device that backs arg
(see backend
), passing along any arguments args
. Additionally, the kernel will be passed the kernel execution context (see [AbstractKernelContext
]), so its signature should be (ctx::AbstractKernelContext, arg0, args...)
.
The keyword arguments kwargs
are not passed to the function, but are interpreted on the host to influence how the kernel is executed. The following keyword arguments are supported:
target::AbstractArray
: specify which array object to use for determining execution properties (defaults to the first argumentarg0
).elements::Int
: how many elements will be processed by this kernel. In most circumstances, this will correspond to the total number of threads that needs to be launched, unless the kernel supports a variable number of elements to process per iteration. Defaults to the length ofarg0
if no other keyword arguments that influence the launch configuration are specified.threads::Int
andblocks::Int
: configure exactly how many threads and blocks are launched. This cannot be used in combination with theelements
argument.name::String
: inform the back end about the name of the kernel to be executed. This can be used to emit better diagnostics, and is useful with anonymous kernels.
Missing docstring for GPUArrays.thread_block_heuristic
. Check Documenter's build log for details.
You then need to provide implementations of certain methods that will be executed on the device itself:
GPUArrays.AbstractDeviceArray
— TypeAbstractDeviceArray{T, N} <: DenseArray{T, N}
Supertype for N
-dimensional GPU arrays (or array-like types) with elements of type T
. Instances of this type are expected to live on the device, see AbstractGPUArray
for host-side objects.
GPUArrays.LocalMemory
— FunctionCreates a block local array pointer with T
being the element type and N
the length. Both T and N need to be static! C is a counter for approriately get the correct Local mem id in CUDAnative. This is an internal method which needs to be overloaded by the GPU Array backends
GPUArrays.synchronize_threads
— Function synchronize_threads(ctx::AbstractKernelContext)
in CUDA terms __synchronize
in OpenCL terms: barrier(CLK_LOCAL_MEM_FENCE)
Missing docstring for GPUArrays.blockidx
. Check Documenter's build log for details.
Missing docstring for GPUArrays.blockdim
. Check Documenter's build log for details.
Missing docstring for GPUArrays.threadidx
. Check Documenter's build log for details.
Missing docstring for GPUArrays.griddim
. Check Documenter's build log for details.
Host abstractions
You should provide an array type that builds on the AbstractGPUArray
supertype:
Missing docstring for AbstractGPUArray
. Check Documenter's build log for details.
First of all, you should implement operations that are expected to be defined for any AbstractArray
type. Refer to the Julia manual for more details, or look at the JLArray
reference implementation.
To be able to actually use the functionality that is defined for AbstractGPUArray
s, you should provide implementations of the following interfaces:
Missing docstring for GPUArrays.backend
. Check Documenter's build log for details.