Interface

To extend the above functionality to a new array type, you should use the types and implement the interfaces listed on this page. GPUArrays is design around having two different array types to represent a GPU array: one that only ever lives on the host, and one that actually can be instantiated on the device (i.e. in kernels).

Device functionality

Several types and interfaces are related to the device and execution of code on it. First of all, you need to provide a type that represents your execution back-end and a way to call kernels:

Missing docstring.

Missing docstring for GPUArrays.AbstractGPUBackend. Check Documenter's build log for details.

Missing docstring.

Missing docstring for GPUArrays.AbstractKernelContext. Check Documenter's build log for details.

GPUArrays.gpu_callFunction
gpu_call(kernel::Function, arg0, args...; kwargs...)

Executes kernel on the device that backs arg (see backend), passing along any arguments args. Additionally, the kernel will be passed the kernel execution context (see [AbstractKernelContext]), so its signature should be (ctx::AbstractKernelContext, arg0, args...).

The keyword arguments kwargs are not passed to the function, but are interpreted on the host to influence how the kernel is executed. The following keyword arguments are supported:

  • target::AbstractArray: specify which array object to use for determining execution properties (defaults to the first argument arg0).
  • elements::Int: how many elements will be processed by this kernel. In most circumstances, this will correspond to the total number of threads that needs to be launched, unless the kernel supports a variable number of elements to process per iteration. Defaults to the length of arg0 if no other keyword arguments that influence the launch configuration are specified.
  • threads::Int and blocks::Int: configure exactly how many threads and blocks are launched. This cannot be used in combination with the elements argument.
  • name::String: inform the back end about the name of the kernel to be executed. This can be used to emit better diagnostics, and is useful with anonymous kernels.
source
Missing docstring.

Missing docstring for GPUArrays.thread_block_heuristic. Check Documenter's build log for details.

You then need to provide implementations of certain methods that will be executed on the device itself:

GPUArrays.AbstractDeviceArrayType
AbstractDeviceArray{T, N} <: DenseArray{T, N}

Supertype for N-dimensional GPU arrays (or array-like types) with elements of type T. Instances of this type are expected to live on the device, see AbstractGPUArray for host-side objects.

source
GPUArrays.LocalMemoryFunction

Creates a block local array pointer with T being the element type and N the length. Both T and N need to be static! C is a counter for approriately get the correct Local mem id in CUDAnative. This is an internal method which needs to be overloaded by the GPU Array backends

source
Missing docstring.

Missing docstring for GPUArrays.blockidx. Check Documenter's build log for details.

Missing docstring.

Missing docstring for GPUArrays.blockdim. Check Documenter's build log for details.

Missing docstring.

Missing docstring for GPUArrays.threadidx. Check Documenter's build log for details.

Missing docstring.

Missing docstring for GPUArrays.griddim. Check Documenter's build log for details.

Host abstractions

You should provide an array type that builds on the AbstractGPUArray supertype:

Missing docstring.

Missing docstring for AbstractGPUArray. Check Documenter's build log for details.

First of all, you should implement operations that are expected to be defined for any AbstractArray type. Refer to the Julia manual for more details, or look at the JLArray reference implementation.

To be able to actually use the functionality that is defined for AbstractGPUArrays, you should provide implementations of the following interfaces:

Missing docstring.

Missing docstring for GPUArrays.backend. Check Documenter's build log for details.