Task Partitioning · AcceleratedKernels.jl

Multithreaded Task Partitioning

AcceleratedKernels.TaskPartitioner — Type

Partitioning num_elems elements / jobs over maximum max_tasks tasks with minimum min_elems elements per task.

Methods

TaskPartitioner(num_elems, max_tasks=Threads.nthreads(), min_elems=1)
Base.getindex(tp::TaskPartitioner, itask::Integer)
Base.firstindex(tp::TaskPartitioner)
Base.lastindex(tp::TaskPartitioner)
Base.length(tp::TaskPartitioner)

Fields

num_elems::Int : (user-defined) number of elements / jobs to partition.
max_tasks::Int : (user-defined) maximum number of tasks to use.
min_elems::Int : (user-defined) minimum number of elements per task.
num_tasks::Int : (computed) number of tasks actually needed.
task_istarts::Vector{Int} : (computed) element starting index for each task.
tasks::Vector{Task} : (computed) array of tasks; can be reused.

Examples

using AcceleratedKernels: TaskPartitioner

# Divide 10 elements between 4 tasks
tp = TaskPartitioner(10, 4)
for i in 1:tp.num_tasks
    @show tp[i]
end

# output
tp[i] = 1:3
tp[i] = 4:6
tp[i] = 7:8
tp[i] = 9:10

using AcceleratedKernels: TaskPartitioner

# Divide 20 elements between 6 tasks with minimum 5 elements per task.
# Not all tasks will be required
tp = TaskPartitioner(20, 6, 5)
for i in 1:tp.num_tasks
    @show tp[i]
end

# output
tp[i] = 1:5
tp[i] = 6:10
tp[i] = 11:15
tp[i] = 16:20

The TaskPartitioner is used internally by task_partition and itask_partition; you can construct one manually if you want to reuse the same partitioning for multiple tasks - this also reuses the tasks array and minimises allocations.

source

AcceleratedKernels.task_partition — Function

task_partition(f, num_elems, max_tasks=Threads.nthreads(), min_elems=1)
task_partition(f, tp::TaskPartitioner)

Partition num_elems jobs across at most num_tasks parallel tasks with at least min_elems per task, calling f(start_index:end_index), where the indices are between 1 and num_elems.

Examples

A toy example showing outputs:

num_elems = 4
task_partition(println, num_elems)

# Output, possibly in a different order due to threading order
1:1
4:4
2:2
3:3

This function is probably most useful with a do-block, e.g.:

task_partition(4) do irange
    some_long_computation(param1, param2, irange)
end

The TaskPartitioner form allows you to reuse the same partitioning for multiple tasks - this also reuses the tasks array and minimises allocations:

tp = TaskPartitioner(4)
task_partition(tp) do irange
    some_long_computation(param1, param2, irange)
end
# Reuse same partitioning and tasks array
task_partition(tp) do irange
    some_other_long_computation(param1, param2, irange)
end

source

AcceleratedKernels.itask_partition — Function

itask_partition(f, num_elems, max_tasks=Threads.nthreads(), min_elems=1)
itask_partition(f, tp::TaskPartitioner)

Partition num_elems jobs across at most num_tasks parallel tasks with at least min_elems per task, calling f(itask, start_index:end_index), where the indices are between 1 and num_elems.

Examples

A toy example showing outputs:

num_elems = 4
itask_partition(num_elems) do itask, irange
    @show (itask, irange)
end

# Output, possibly in a different order due to threading order
(itask, irange) = (3, 3:3)
(itask, irange) = (1, 1:1)
(itask, irange) = (2, 2:2)
(itask, irange) = (4, 4:4)

This function is probably most useful with a do-block, e.g.:

task_partition(4) do itask, irange
    some_long_computation_needing_itask(param1, param2, irange)
end

The TaskPartitioner form allows you to reuse the same partitioning for multiple tasks - this also reuses the tasks array and minimises allocations:

tp = TaskPartitioner(4)
itask_partition(tp) do itask, irange
    some_long_computation_needing_itask(param1, param2, irange)
end
# Reuse same partitioning and tasks array
itask_partition(tp) do itask, irange
    some_other_long_computation_needing_itask(param1, param2, irange)
end

source