Paper Review of "hiCUDA: A High-level Directive-based Language for GPU Programming"
Basic Concept
hiCUDA suggests a new approach for general purpose GPU programming. Traditionally, programmers were expected to identify parallelizable code within their program, manually write GPU kernel code and manage data transfer between the host and GPUs. Obviously, this is an extremely tedious and error-prone task. With hiCUDA, programmers still need to identify parallelizable code blocks but they no longer need to hand-operate everything else. This is achieved by the #pragma
directive available in the C/C++ preprocessor. There are two main aspects to focus on: computation and data.
For those who would like to try out hiCUDA project page can be found here.
Computation Model
hiCUDA provides a few directives to control how things are computed. The kernel
directive denotes a region in which can be converted to a CUDA kernel.
#pragma hicuda kernel ...
#pragma hicuda kernel_end
The loop_partition
directive provides fine-grained control on how loops can be distributed over threads or thread blocks.
#pragma hicuda loop_partition
Finally, the singular
directive allows programmers to identify kernel code to be executed only once in a thread block.
#pragma hicuda singular
#pragma hicuda singular_end
Data Model
hiCUDA provides three main directives: global
, constant
and shared
. Besides those three directives, there is an auxiliary directive to specify dimensions of arrays.
#pragma hicuda shape ptr-var-sym {[dim-size]}+
Each of those three main directives supports one or more of the following operations:
alloc
copyin
copyout
free
remove
Most of operations are self-explanatory except it is not entirely clear why global
is paired with free
while others are paired with remove
.
Weaknesses
Although hiCUDA has done an excellent job in providing an abstraction layer for GPU programming, there are still some room for improvement.
- C/C++ only; a different approach is required for other languages with no preprocessor.
- Code inside a kernel region must be primitive; this prevents programmers to fully utilize rich libraries and frameworks available on the host.
- Texture memory cannot be accessed.