Array Operations

This page documents the array types and operations provided by oneAPI.jl.

Array Types

Host-Side Arrays

oneArray{T,N,B}

N-dimensional dense array type for Intel GPU programming using oneAPI and Level Zero.

Type Parameters:

  • T: Element type (must be stored inline, no isbits-unions)
  • N: Number of dimensions
  • B: Buffer type, one of:
    • oneL0.DeviceBuffer: GPU device memory (default, not CPU-accessible)
    • oneL0.SharedBuffer: Unified shared memory (CPU and GPU accessible)
    • oneL0.HostBuffer: Pinned host memory (CPU-accessible, GPU-visible)

Type Aliases:

  • oneVector{T} = oneArray{T,1} - 1D array
  • oneMatrix{T} = oneArray{T,2} - 2D array
  • oneVecOrMat{T} = Union{oneVector{T}, oneMatrix{T}} - 1D or 2D array

Device-Side Arrays

oneDeviceArray{T,N,A}

Device-side array type for use within GPU kernels. This type represents a view of GPU memory accessible within kernel code. Unlike oneArray which is used on the host, oneDeviceArray is designed for device-side operations and cannot be directly constructed on the host.

Type Parameters:

  • T: Element type
  • N: Number of dimensions
  • A: Address space (typically AS.CrossWorkgroup for global memory)

Type Aliases:

  • oneDeviceVector = oneDeviceArray{T,1} - 1D device array
  • oneDeviceMatrix = oneDeviceArray{T,2} - 2D device array

oneLocalArray(::Type{T}, dims)

Allocate local (workgroup-shared) memory within a GPU kernel. Local memory is shared among all work-items in a workgroup and provides faster access than global memory.

Memory Type Queries

is_device(a::oneArray) -> Bool

Check if the array is stored in device memory (not directly CPU-accessible).

is_shared(a::oneArray) -> Bool

Check if the array is stored in shared (unified) memory, accessible from both CPU and GPU.

is_host(a::oneArray) -> Bool

Check if the array is stored in pinned host memory, which resides on the CPU but is visible to the GPU.

Array Construction

oneArray supports multiple construction patterns similar to standard Julia arrays:

using oneAPI

# Uninitialized arrays
a = oneArray{Float32}(undef, 100)
b = oneArray{Float32,2}(undef, 10, 10)

# Specify memory type
c = oneArray{Float32,1,oneL0.SharedBuffer}(undef, 100)  # Shared memory
d = oneArray{Float32,1,oneL0.HostBuffer}(undef, 100)    # Host memory

# From existing arrays
e = oneArray(rand(Float32, 100))
f = oneArray([1, 2, 3, 4])

# Using zeros/ones/rand
g = oneAPI.zeros(Float32, 100)
h = oneAPI.ones(Float32, 100)
i = oneAPI.rand(Float32, 100)

# Do-block for automatic cleanup
result = oneArray{Float32}(100) do arr
    arr .= 1.0f0
    sum(arr)  # Returns result, arr is freed automatically
end

Array Operations

oneArray implements the full AbstractArray interface and supports:

Broadcasting

a = oneArray(rand(Float32, 100))
b = oneArray(rand(Float32, 100))

c = a .+ b          # Element-wise addition
d = a .* 2.0f0      # Scalar multiplication
e = sin.(a)         # Unary operations
f = a .+ b .* c     # Fused operations

Reductions

a = oneArray(rand(Float32, 100))

s = sum(a)          # Sum
p = prod(a)         # Product
m = maximum(a)      # Maximum
n = minimum(a)      # Minimum
μ = mean(a)         # Mean (requires Statistics)

Mapping

a = oneArray(rand(Float32, 100))

b = map(x -> x^2, a)        # Apply function
c = map(+, a, b)            # Binary operation

Accumulation

a = oneArray([1, 2, 3, 4])

b = cumsum(a)       # Cumulative sum: [1, 3, 6, 10]
c = cumprod(a)      # Cumulative product: [1, 2, 6, 24]

Finding Elements

a = oneArray([1.0f0, -2.0f0, 3.0f0, -4.0f0])

indices = findall(x -> x > 0, a)  # Indices of positive elements

Random Number Generation

using oneAPI, Random

# Uniform distribution
a = oneAPI.rand(Float32, 100)
b = oneAPI.rand(Float32, 10, 10)

# Normal distribution
c = oneAPI.randn(Float32, 100)

# With seed
Random.seed!(1234)
d = oneAPI.rand(Float32, 100)

Data Transfer

CPU to GPU

# Using constructor
h_array = rand(Float32, 100)
d_array = oneArray(h_array)

# Using copyto!
d_array = oneArray{Float32}(undef, 100)
copyto!(d_array, h_array)

GPU to CPU

# Using Array constructor
h_array = Array(d_array)

# Using copyto!
h_array = Vector{Float32}(undef, 100)
copyto!(h_array, d_array)

GPU to GPU

d_array1 = oneArray(rand(Float32, 100))
d_array2 = similar(d_array1)
copyto!(d_array2, d_array1)

Memory Types Comparison

Memory TypeCPU AccessGPU AccessPerformanceUse Case
Device (default)❌ No✅ FastFastestGPU computations
Shared✅ Yes✅ GoodGoodCPU-GPU data sharing
Host✅ Yes✅ SlowerModerateStaging, pinned buffers
# Device memory (default, fastest for GPU)
a = oneArray{Float32}(undef, 100)

# Shared memory (CPU and GPU accessible)
b = oneArray{Float32,1,oneL0.SharedBuffer}(undef, 100)

# Host memory (CPU memory visible to GPU)
c = oneArray{Float32,1,oneL0.HostBuffer}(undef, 100)

# Query memory type
is_device(a)  # true
is_shared(b)  # true
is_host(c)    # true

Views and Slicing

oneArray supports array views for efficient sub-array operations without copying:

a = oneArray(rand(Float32, 100))

# Create a view
v = view(a, 1:50)
v .= 0.0f0  # Modifies first 50 elements of a

# Slicing returns a view
s = a[1:50]  # This is a view, not a copy

Reshaping

a = oneArray(rand(Float32, 100))

# Reshape to 2D
b = reshape(a, 10, 10)

# Flatten
c = vec(b)  # Returns 1D view

Advanced: Custom Array Wrappers

For advanced use cases, oneAPI.jl provides type aliases for array wrappers:

  • oneDenseArray: Dense contiguous arrays
  • oneStridedArray: Arrays with arbitrary strides (including views)
  • oneWrappedArray: Any array backed by a oneArray

These are useful for writing functions that accept various array types:

function my_kernel!(a::oneStridedArray{Float32})
    # Accepts oneArray and views
    a .+= 1.0f0
end