Reflection

Reflection

Because of using a different compilation toolchain, CUDAnative.jl offers counterpart functions to the code_ functionality from Base:

CUDAnative.code_llvmFunction.
code_llvm([io], f, types; optimize=true, dump_module=false, cap::VersionNumber)

Prints the LLVM IR generated for the method matching the given generic function and type signature to io which defaults to STDOUT. The IR is optimized according to optimize (defaults to true), and the entire module, including headers and other functions, is dumped if dump_module is set (defaults to false). The device capability cap to generate code for defaults to the current active device's capability, or v"2.0" if there is no such active context.

source
CUDAnative.code_ptxFunction.
code_ptx([io], f, types; cap::VersionNumber, kernel::Bool=false)

Prints the PTX assembly generated for the method matching the given generic function and type signature to io which defaults to STDOUT. The device capability cap to generate code for defaults to the current active device's capability, or v"2.0" if there is no such active context. The optional kernel parameter indicates whether the function in question is an entry-point function, or a regular device function.

source
CUDAnative.code_sassFunction.
code_sass([io], f, types, cap::VersionNumber)

Prints the SASS code generated for the method matching the given generic function and type signature to io which defaults to STDOUT. The device capability cap to generate code for defaults to the current active device's capability, or v"2.0" if there is no such active context.

Note that the method needs to be a valid entry-point kernel, ie. it should not return any values.

source

Convenience macros

For ease of use, CUDAnative.jl also implements @code_ macros wrapping the above reflection functionality. These macros determines the type of arguments (taking into account GPU type conversions), and call the underlying code_ function. In addition, these functions understand the @cuda invocation syntax, so you conveniently put them in front an existing @cuda invocation.

code_lowered

Extracts the relevant function call from any @cuda invocation, evaluates the arguments to the function or macro call, determines their types (taking into account GPU-specific type conversions), and calls code_lowered on the resulting expression. Can be applied to a pure function call, or a call prefixed with the @cuda macro. In that case, kernel code generation conventions are used (wrt. argument conversions, return values, etc).

source
code_typed

Extracts the relevant function call from any @cuda invocation, evaluates the arguments to the function or macro call, determines their types (taking into account GPU-specific type conversions), and calls code_typed on the resulting expression. Can be applied to a pure function call, or a call prefixed with the @cuda macro. In that case, kernel code generation conventions are used (wrt. argument conversions, return values, etc).

source
code_warntype

Extracts the relevant function call from any @cuda invocation, evaluates the arguments to the function or macro call, determines their types (taking into account GPU-specific type conversions), and calls code_warntype on the resulting expression. Can be applied to a pure function call, or a call prefixed with the @cuda macro. In that case, kernel code generation conventions are used (wrt. argument conversions, return values, etc).

source
code_llvm

Extracts the relevant function call from any @cuda invocation, evaluates the arguments to the function or macro call, determines their types (taking into account GPU-specific type conversions), and calls code_llvm on the resulting expression. Can be applied to a pure function call, or a call prefixed with the @cuda macro. In that case, kernel code generation conventions are used (wrt. argument conversions, return values, etc).

source
code_ptx

Extracts the relevant function call from any @cuda invocation, evaluates the arguments to the function or macro call, determines their types (taking into account GPU-specific type conversions), and calls code_ptx on the resulting expression. Can be applied to a pure function call, or a call prefixed with the @cuda macro. In that case, kernel code generation conventions are used (wrt. argument conversions, return values, etc).

source
code_sass

Extracts the relevant function call from any @cuda invocation, evaluates the arguments to the function or macro call, determines their types (taking into account GPU-specific type conversions), and calls code_sass on the resulting expression. Can be applied to a pure function call, or a call prefixed with the @cuda macro. In that case, kernel code generation conventions are used (wrt. argument conversions, return values, etc).

source