Debugging Kernels
As the compilation pipeline of GPU kernels is different to that of base Julia, error messages also look different - for example, where Julia would insert an exception when a variable name was not defined (e.g. we had a typo), a GPU kernel throwing exceptions cannot be compiled and instead you'll see some cascading errors like "[...] compiling [...] resulted in invalid LLVM IR"
caused by "Reason: unsupported use of an undefined name"
resulting in "Reason: unsupported dynamic function invocation"
, etc.
Thankfully, there are only about 3 types of such error messages and they're not that scary when you look into them.
Undefined Variables / Typos
If you misspell a variable name, Julia would insert an exception:
function set_color(v, color)
AK.foreachindex(v) do i
v[i] = colour # Grab your porridge
end
end
However, exceptions cannot be compiled on GPUs and you will see cascading errors like below:
The key thing to look for is undefined name
, then search for it in your code.
Exceptions and Checks that throw
As mentioned above, exceptions cannot be compiled in GPU kernels; however, many normal-looking functions that we reference in kernels may contain argument-checking. If it cannot be proved that a check branch would not throw an exception, you will see a similar cascade of errors. For example, casting a Float32
to an Int32
includes an InexactError
exception check - see this tame-looking code:
function mymul!(v)
AK.foreachindex(v) do i
v[i] *= 2f0
end
end
v = MtlArray(1:1000)
mymul!(v)
See any problem with it? The MtlArray(1:1000)
creates a GPU vector filled with Int64
values, but within foreachindex
we do v[i] *= 2.0
. We are multiplying an Int64
by a Float32
, resulting in a Float32
value that we try to write back into v
- this may throw an exception, like in normal Julia code:
julia> x = [1, 2, 3];
julia> x[1] = 42.5
ERROR: InexactError: Int64(42.5)
On GPUs you will see an error like this:
Note the error stack: setindex!
, convert
, Int64
, box_float32
- because of the exception check, we have a type instability, which in turn results in boxing values behind pointers, in turn resulting in dynamic memory allocation and finally the error we see at the top, unsupported call to gpu_malloc
.
You may need to do your correctness checks manually, without exceptions; in this specific case, if we did want to cast a Float32
to an Int
, we could use unsafe_trunc(T, x)
- though be careful when using unsafe functions that you understand their behaviour and assumptions (e.g. log
has a DomainError
check for negative values):
function mymul!(v)
AK.foreachindex(v) do i
v[i] = unsafe_trunc(eltype(v), v[i] * 2.5f0)
end
end
v = MtlArray(1:1000)
mymul!(v)
Type Instability / Global Variables
Types must be known to be captured and compiled within GPU kernels. Global variables without const
are not type-stable, as you could associate a different value later on in a script:
v = MtlArray(1:1000)
AK.foreachindex(v) do i
v[i] *= 2
end
v = "potato"
The error stack is a bit more difficult here:
You see a few dynamic function invocation
, an unsupported call to gpu_malloc
, and a bit further down a box
. The more operations you do on the type-unstable object, the more dynamic function invocation
errors you'll see. These would also be the steps Base Julia would take to allow dynamically-changing objects: they'd be put in a Box
behind pointers, and allocated on the heap. In a way, it is better that we cannot do that on a GPU, as it hurts performance massively.
Solving this is easy - stick the foreachindex
in a function where the variable types are known at compile-time:
function mymul!(v, x)
AK.foreachindex(v) do i
v[i] *= x
end
end
v = MtlArray(1:1000)
mymul!(v, 2)
Note that Julia's lambda capture is very powerful - inside AK.foreachindex
you can references other objects from within the function (like x
), without explicitly passing them to the GPU.
Note that while technically we could write const v
in the global scope, when writing a closure (in a do ... end
block) that references outer objects, the closure may try to capture other variables defined in the current Julia session; as such, it is an unreliable approach for GPU kernels (example issue) that we do not recommend. Use functions.
Apple Metal Only: Float64 is not Supported
Mac GPUs do not natively support Float64
values; there is a high-level check when trying to create an array:
julia> x = MtlArray([1.0, 2.0, 3.0])
ERROR: Metal does not support Float64 values, try using Float32 instead
However, if we tried to use / convert values in a kernel to a Float64
:
function mymul!(v, x)
AK.foreachindex(v) do i
v[i] *= x
end
end
v = MtlArray{Float32}(1:1000)
mymul!(v, 2.0)
Note that we try to multiply Float32
values by 2.0
, which is a Float64
- in which case we get:
ERROR: LoadError: Compilation to native code failed; see below for details.
[...]
caused by: NSError: Compiler encountered an internal error (AGXMetalG15X_M1, code 3)
[...]
Change the 2.0
to 2.0f0
or Float32(2)
; in kernels with generic types (that are supposed to work on multiple possible input types), do use the same types as your inputs, using e.g. T = eltype(v)
then zero(T)
, T(42)
, etc.
For other library-related problems, feel free to post a GitHub issue. For help implementing new code, or just advice, you can also use the Julia Discourse forum, the community is incredibly helpful.