Map

AcceleratedKernels.map!Function
map!(
    f, dst::AbstractArray, src::AbstractArray, backend::Backend=get_backend(src);

    # CPU settings
    max_tasks=Threads.nthreads(),
    min_elems=1,

    # GPU settings
    block_size=256,
)

Apply the function f to each element of src in parallel and store the result in dst. The CPU and GPU settings are the same as for foreachindex.

On CPUs, multithreading only improves performance when complex computation hides the memory latency and the overhead of spawning tasks - that includes more complex functions and less cache-local array access patterns. For compute-bound tasks, it scales linearly with the number of threads.

Examples

import Metal
import AcceleratedKernels as AK

x = MtlArray(rand(Float32, 100_000))
y = similar(x)
AK.map!(y, x) do x_elem
    T = typeof(x_elem)
    T(2) * x_elem + T(1)
end
source
AcceleratedKernels.mapFunction
map(
    f, src::AbstractArray, backend::Backend=get_backend(src);

    # CPU settings
    max_tasks=Threads.nthreads(),
    min_elems=1,

    # GPU settings
    block_size=256,
)

Apply the function f to each element of src and store the results in a copy of src (if f changes the eltype, allocate dst separately and call map!). The CPU and GPU settings are the same as for foreachindex.

source