Arithmetics
AcceleratedKernels.sum
— Functionsum(
src::AbstractArray, backend::Backend=get_backend(src);
init=zero(eltype(src)),
dims::Union{Nothing, Int}=nothing,
# CPU settings
scheduler=:static,
max_tasks=Threads.nthreads(),
min_elems=1,
# GPU settings
block_size::Int=256,
temp::Union{Nothing, AbstractArray}=nothing,
switch_below::Int=0,
)
Sum of elements of an array, with optional init
and dims
. Arguments are the same as for reduce
.
Examples
Simple sum of elements in a vector:
import AcceleratedKernels as AK
using Metal
v = MtlArray(rand(Int32(1):Int32(100), 100_000))
s = AK.sum(v)
Row-wise sum of a matrix:
m = MtlArray(rand(Int32(1):Int32(100), 10, 100_000))
s = AK.sum(m, dims=1)
If you know the shape of the resulting array (in case of a axis-wise sum, i.e. dims
is not nothing
), you can provide the temp
argument to save results into and avoid allocations:
m = MtlArray(rand(Int32(1):Int32(100), 10, 100_000))
temp = MtlArray(zeros(Int32, 10))
s = AK.sum(m, dims=2, temp=temp)
AcceleratedKernels.prod
— Functionprod(
src::AbstractArray, backend::Backend=get_backend(src);
init=one(eltype(src)),
dims::Union{Nothing, Int}=nothing,
# CPU settings
scheduler=:static,
max_tasks=Threads.nthreads(),
min_elems=1,
# GPU settings
block_size::Int=256,
temp::Union{Nothing, AbstractArray}=nothing,
switch_below::Int=0,
)
Product of elements of an array, with optional init
and dims
. Arguments are the same as for reduce
.
Examples
Simple product of elements in a vector:
import AcceleratedKernels as AK
using AMDGPU
v = ROCArray(rand(Int32(1):Int32(100), 100_000))
p = AK.prod(v)
Row-wise product of a matrix:
m = ROCArray(rand(Int32(1):Int32(100), 10, 100_000))
p = AK.prod(m, dims=1)
If you know the shape of the resulting array (in case of a axis-wise product, i.e. dims
is not nothing
), you can provide the temp
argument to save results into and avoid allocations:
m = ROCArray(rand(Int32(1):Int32(100), 10, 100_000))
temp = ROCArray(ones(Int32, 10))
p = AK.prod(m, dims=2, temp=temp)
AcceleratedKernels.minimum
— Functionminimum(
src::AbstractArray, backend::Backend=get_backend(src);
init=typemax(eltype(src)),
dims::Union{Nothing, Int}=nothing,
# CPU settings
scheduler=:static,
max_tasks=Threads.nthreads(),
min_elems=1,
# GPU settings
block_size::Int=256,
temp::Union{Nothing, AbstractArray}=nothing,
switch_below::Int=0,
)
Minimum of elements of an array, with optional init
and dims
. Arguments are the same as for reduce
.
Examples
Simple minimum of elements in a vector:
import AcceleratedKernels as AK
using CUDA
v = CuArray(rand(Int32(1):Int32(100), 100_000))
m = AK.minimum(v)
Row-wise minimum of a matrix:
m = CuArray(rand(Int32(1):Int32(100), 10, 100_000))
m = AK.minimum(m, dims=1)
If you know the shape of the resulting array (in case of a axis-wise minimum, i.e. dims
is not nothing
), you can provide the temp
argument to save results into and avoid allocations:
m = CuArray(rand(Int32(1):Int32(100), 10, 100_000))
temp = CuArray(ones(Int32, 10))
m = AK.minimum(m, dims=2, temp=temp)
AcceleratedKernels.maximum
— Functionmaximum(
src::AbstractArray, backend::Backend=get_backend(src);
init=typemin(eltype(src)),
dims::Union{Nothing, Int}=nothing,
# CPU settings
scheduler=:static,
max_tasks=Threads.nthreads(),
min_elems=1,
# GPU settings
block_size::Int=256,
temp::Union{Nothing, AbstractArray}=nothing,
switch_below::Int=0,
)
Maximum of elements of an array, with optional init
and dims
. Arguments are the same as for reduce
.
Examples
Simple maximum of elements in a vector:
import AcceleratedKernels as AK
using oneAPI
v = oneArray(rand(Int32(1):Int32(100), 100_000))
m = AK.maximum(v)
Row-wise maximum of a matrix:
m = oneArray(rand(Int32(1):Int32(100), 10, 100_000))
m = AK.maximum(m, dims=1)
If you know the shape of the resulting array (in case of a axis-wise maximum, i.e. dims
is not nothing
), you can provide the temp
argument to save results into and avoid allocations:
m = oneArray(rand(Int32(1):Int32(100), 10, 100_000))
temp = oneArray(zeros(Int32, 10))
m = AK.maximum(m, dims=2, temp=temp)
AcceleratedKernels.count
— Functioncount(
[f=identity], src::AbstractArray, backend::Backend=get_backend(src);
init=0,
dims::Union{Nothing, Int}=nothing,
# CPU settings
scheduler=:static,
max_tasks=Threads.nthreads(),
min_elems=1,
# GPU settings
block_size::Int=256,
temp::Union{Nothing, AbstractArray}=nothing,
switch_below::Int=0,
)
Count the number of elements in src
for which the function f
returns true
. If f
is omitted, count the number of true
elements in src
. Arguments are the same as for mapreduce
.
Examples
Simple count of true
elements in a vector:
import AcceleratedKernels as AK
using Metal
v = MtlArray(rand(Bool, 100_000))
c = AK.count(v)
Count of elements greater than 50 in a vector:
v = MtlArray(rand(Int32(1):Int32(100), 100_000))
c = AK.count(x -> x > 50, v)
Row-wise count of true
elements in a matrix:
m = MtlArray(rand(Bool, 10, 100_000))
c = AK.count(m, dims=1)
If you know the shape of the resulting array (in case of a axis-wise count, i.e. dims
is not nothing
), you can provide the temp
argument to save results into and avoid allocations:
m = MtlArray(rand(Bool, 10, 100_000))
temp = MtlArray(zeros(Int32, 10))
c = AK.count(m, dims=2, temp=temp)
AcceleratedKernels.cumsum
— Functioncumsum(
src::AbstractArray, backend::Backend=get_backend(src);
init=zero(eltype(src)),
neutral=zero(eltype(src)),
dims::Union{Nothing, Int}=nothing,
# Algorithm choice
alg::AccumulateAlgorithm=DecoupledLookback(),
# GPU settings
block_size::Int=256,
temp::Union{Nothing, AbstractArray}=nothing,
temp_flags::Union{Nothing, AbstractArray}=nothing,
)
Cumulative sum of elements of an array, with optional init
and dims
. Arguments are the same as for accumulate
.
Platform-Specific Notes
On Apple Metal, the alg=ScanPrefixes()
algorithm is used by default.
Examples
Simple cumulative sum of elements in a vector:
import AcceleratedKernels as AK
using AMDGPU
v = ROCArray(rand(Int32(1):Int32(100), 100_000))
s = AK.cumsum(v)
Row-wise cumulative sum of a matrix:
m = ROCArray(rand(Int32(1):Int32(100), 10, 100_000))
s = AK.cumsum(m, dims=1)
AcceleratedKernels.cumprod
— Functioncumprod(
src::AbstractArray, backend::Backend=get_backend(src);
init=one(eltype(src)),
neutral=one(eltype(src)),
dims::Union{Nothing, Int}=nothing,
# Algorithm choice
alg::AccumulateAlgorithm=DecoupledLookback(),
# GPU settings
block_size::Int=256,
temp::Union{Nothing, AbstractArray}=nothing,
temp_flags::Union{Nothing, AbstractArray}=nothing,
)
Cumulative product of elements of an array, with optional init
and dims
. Arguments are the same as for accumulate
.
Platform-Specific Notes
On Apple Metal, the alg=ScanPrefixes()
algorithm is used by default.
Examples
Simple cumulative product of elements in a vector:
import AcceleratedKernels as AK
using oneAPI
v = oneArray(rand(Int32(1):Int32(100), 100_000))
p = AK.cumprod(v)
Row-wise cumulative product of a matrix:
m = oneArray(rand(Int32(1):Int32(100), 10, 100_000))
p = AK.cumprod(m, dims=1)