Compilation & Execution
@cuda (gridDim::CuDim, blockDim::CuDim, [shmem::Int], [stream::CuStream]) func(args...)
High-level interface for calling functions on a GPU, queues a kernel launch on the current context. The
blockDim arguments represent the launch configuration, the optional
shmem parameter specifies how much bytes of dynamic shared memory should be allocated (defaulting to 0), while the optional
stream parameter indicates on which stream the launch should be scheduled.
func argument should be a valid Julia function. It will be compiled to a CUDA function upon first use, and to a certain extent arguments will be converted and managed automatically (see
cudaconvert). Finally, a call to
cudacall is performed, scheduling the compiled function for execution on the GPU.
This function is called for every argument to be passed to a kernel, allowing it to be converted to a GPU-friendly format. By default, the function does nothing and returns the input object
CuArray objects, a corresponding
CuDeviceArray object in global space is returned, which implements GPU-compatible array functionality.
Return the nearest number of threads that is a multiple of the warp size of a device:
This is a common requirement, eg. when using shuffle intrinsics.