Reflection

Reflection

Because of using a different compilation toolchain, CUDAnative.jl offers counterpart functions to the code_ functionality from Base:

CUDAnative.code_llvmFunction.
code_llvm([io], f, types; optimize=true, cap::VersionNumber, kernel=true,
                          dump_module=false, strip_ir_metadata=true)

Prints the device LLVM IR generated for the method matching the given generic function and type signature to io which defaults to stdout. The IR is optimized according to optimize (defaults to true), which includes entry-point specific optimizations if kernel is set (defaults to false). The device capability cap to generate code for defaults to the current active device's capability, or v"2.0" if there is no such active context. The entire module, including headers and other functions, is dumped if dump_module is set (defaults to false). Finally, setting strip_ir_metadata removes all debug metadata (defaults to true).

See also: @device_code_llvm, InteractiveUtils.code_llvm

source
CUDAnative.code_ptxFunction.
code_ptx([io], f, types; cap::VersionNumber, kernel=false, strip_ir_metadata=true)

Prints the PTX assembly generated for the method matching the given generic function and type signature to io which defaults to stdout. The device capability cap to generate code for defaults to the current active device's capability, or v"2.0" if there is no such active context. The optional kernel parameter indicates whether the function in question is an entry-point function, or a regular device function. Finally, setting strip_ir_metadata removes all debug metadata (defaults to true).

See also: @device_code_ptx

source
CUDAnative.code_sassFunction.
code_sass([io], f, types, cap::VersionNumber)

Prints the SASS code generated for the method matching the given generic function and type signature to io which defaults to stdout. The device capability cap to generate code for defaults to the current active device's capability, or v"2.0" if there is no such active context. The method needs to be a valid entry-point kernel, eg. it should not return any values.

See also: @device_code_sass

source

Convenience macros

For ease of use, CUDAnative.jl also implements @device_code_ macros wrapping the above reflection functionality. These macros evaluate the expression argument, while tracing compilation and finally printing or returning the code for every invoked CUDA kernel. Do note that this evaluation can have side effects, as opposed to similarly-named @code_ macros in Base which are free of side effects.

@device_code_lowered ex

Evaluates the expression ex and returns the result of InteractiveUtils.code_lowered for every compiled CUDA kernel.

See also: InteractiveUtils.@code_lowered

source
@device_code_typed ex

Evaluates the expression ex and returns the result of InteractiveUtils.code_typed for every compiled CUDA kernel.

See also: InteractiveUtils.@code_typed

source
@device_code_warntype [io::IO=stdout] ex

Evaluates the expression ex and prints the result of InteractiveUtils.code_warntype to io for every compiled CUDA kernel.

See also: InteractiveUtils.@code_warntype

source
@device_code_llvm [io::IO=stdout, ...] ex

Evaluates the expression ex and prints the result of InteractiveUtils.code_llvm to io for every compiled CUDA kernel. For other supported keywords, see CUDAnative.code_llvm.

See also: InteractiveUtils.@code_llvm

source
@device_code_ptx [io::IO=stdout, ...] ex

Evaluates the expression ex and prints the result of CUDAnative.code_ptx to io for every compiled CUDA kernel. For other supported keywords, see CUDAnative.code_ptx.

source
@device_code_sass [io::IO=stdout, ...] ex

Evaluates the expression ex and prints the result of CUDAnative.code_sass to io for every compiled CUDA kernel. For other supported keywords, see CUDAnative.code_sass.

source
@device_code dir::AbstractString=... [...] ex

Evaluates the expression ex and dumps all intermediate forms of code to the directory dir.

source