Interface

To extend the above functionality to a new array type, you should use the types and implement the interfaces listed on this page. GPUArrays is designed around having two different array types to represent a GPU array: one that exists only on the host, and one that actually can be instantiated on the device (i.e. in kernels). Device functionality is then handled by KernelAbstractions.jl.

Host abstractions

You should provide an array type that builds on the AbstractGPUArray supertype, such as:

mutable struct CustomArray{T, N} <: AbstractGPUArray{T, N}
    data::DataRef{Vector{UInt8}}
    offset::Int
    dims::Dims{N}
    ...
end

This will allow your defined type (in this case JLArray) to use the GPUArrays interface where available. To be able to actually use the functionality that is defined for AbstractGPUArrays, you need to define the backend, like so:

import KernelAbstractions: Backend
struct CustomBackend <: KernelAbstractions.GPU
KernelAbstractions.get_backend(a::CA) where CA <: CustomArray = CustomBackend()

There are numerous examples of potential interfaces for GPUArrays, such as with JLArrays, CuArrays, and ROCArrays.

Caching Allocator

GPUArrays.@cached — Macro

@cached cache expr

Evaluate expr using allocations cache cache.

When GPU memory is allocated during the execution of expr, cache will first be checked. If no memory is available in the cache, a new allocation will be requested.

After the execution of expr, all allocations made under the scope of @cached will be cached within cache for future use. This is useful to avoid relying on GC to free GPU memory in time.

Once cache goes out scope, or when the user calls unsafe_free! on it, all cached allocations will be freed.

Example

In the following example, each iteration of the for-loop requires 8 GiB of GPU memory. Without caching those allocations, significant pressure would be put on the GC, resulting in high memory usage and latency. By using the allocator cache, the memory usage is stable:

cache = GPUArrays.AllocCache()
for i in 1:1000
    GPUArrays.@cached cache begin
        sin.(CUDA.rand(Float32, 1024^3))
    end
end

# optionally: free the memory now, instead of waiting for the GC to collect `cache`
GPUArrays.unsafe_free!(cache)

See @uncached.

source

GPUArrays.@uncached — Macro

@uncached expr

Evaluate expression expr without using the allocation. This is useful to call from within @cached to avoid caching some allocations, e.g., because they can be returned out of the @cached scope.

source