Device Intrinsics
When writing custom kernels, you have access to a set of device intrinsics that map to underlying hardware instructions.
Indexing
These functions allow you to determine the current thread's position in the execution grid.
get_global_id(dim=0): Global index of the work item.get_local_id(dim=0): Local index of the work item within the workgroup.get_group_id(dim=0): Index of the workgroup.get_global_size(dim=0): Global size of the ND-range.get_local_size(dim=0): Size of the workgroup.get_num_groups(dim=0): Number of workgroups.
Synchronization
barrier(flags=0): Synchronizes all work items in a workgroup.
Atomics
Atomic operations are supported for thread-safe updates to memory.
atomic_add!(ptr, val)atomic_sub!(ptr, val)atomic_inc!(ptr)atomic_dec!(ptr)atomic_min!(ptr, val)atomic_max!(ptr, val)atomic_and!(ptr, val)atomic_or!(ptr, val)atomic_xor!(ptr, val)atomic_cmpxchg!(ptr, cmp, val)
Supported types for atomics generally include Int32, Int64, UInt32, UInt64, Float32, and Float64.
Math Functions
Standard math functions from Julia's Base are supported within kernels (e.g., sin, cos, exp, sqrt).