Interface
To extend the above functionality to a new array type, you should use the types and implement the interfaces listed on this page. GPUArrays is designed around having two different array types to represent a GPU array: one that exists only on the host, and one that actually can be instantiated on the device (i.e. in kernels). Device functionality is then handled by KernelAbstractions.jl.
Host abstractions
You should provide an array type that builds on the AbstractGPUArray
supertype, such as:
mutable struct CustomArray{T, N} <: AbstractGPUArray{T, N}
data::DataRef{Vector{UInt8}}
offset::Int
dims::Dims{N}
...
end
This will allow your defined type (in this case JLArray
) to use the GPUArrays interface where available. To be able to actually use the functionality that is defined for AbstractGPUArray
s, you need to define the backend, like so:
import KernelAbstractions: Backend
struct CustomBackend <: KernelAbstractions.GPU
KernelAbstractions.get_backend(a::CA) where CA <: CustomArray = CustomBackend()
There are numerous examples of potential interfaces for GPUArrays, such as with JLArrays, CuArrays, and ROCArrays.
Caching Allocator
GPUArrays.@cached
— Macro@cached cache expr
Evaluate expr
using allocations cache cache
.
When GPU memory is allocated during the execution of expr
, cache
will first be checked. If no memory is available in the cache, a new allocation will be requested.
After the execution of expr
, all allocations made under the scope of @cached
will be cached within cache
for future use. This is useful to avoid relying on GC to free GPU memory in time.
Once cache
goes out scope, or when the user calls unsafe_free!
on it, all cached allocations will be freed.
Example
In the following example, each iteration of the for-loop requires 8 GiB of GPU memory. Without caching those allocations, significant pressure would be put on the GC, resulting in high memory usage and latency. By using the allocator cache, the memory usage is stable:
cache = GPUArrays.AllocCache()
for i in 1:1000
GPUArrays.@cached cache begin
sin.(CUDA.rand(Float32, 1024^3))
end
end
# optionally: free the memory now, instead of waiting for the GC to collect `cache`
GPUArrays.unsafe_free!(cache)
See @uncached
.
GPUArrays.@uncached
— Macro@uncached expr
Evaluate expression expr
without using the allocation. This is useful to call from within @cached
to avoid caching some allocations, e.g., because they can be returned out of the @cached
scope.