Basic Concept

hiCUDA suggests a new approach for general purpose GPU programming. Traditionally, programmers were expected to identify parallelizable code within their program, manually write GPU kernel code and manage data transfer between the host and GPUs. Obviously, this is an extremely tedious and error-prone task. With hiCUDA, programmers still need to identify parallelizable code blocks but they no longer need to hand-operate everything else. This is achieved by the #pragma directive available in the C/C++ preprocessor. There are two main aspects to focus on: computation and data.

Example code of hiCUDA

For those who would like to try out hiCUDA project page can be found here.

Computation Model

hiCUDA provides a few directives to control how things are computed. The kernel directive denotes a region in which can be converted to a CUDA kernel.

#pragma hicuda kernel ...
#pragma hicuda kernel_end

The loop_partition directive provides fine-grained control on how loops can be distributed over threads or thread blocks.

#pragma hicuda loop_partition

Finally, the singular directive allows programmers to identify kernel code to be executed only once in a thread block.

#pragma hicuda singular
#pragma hicuda singular_end

Data Model

hiCUDA provides three main directives: global, constant and shared. Besides those three directives, there is an auxiliary directive to specify dimensions of arrays.

#pragma hicuda shape ptr-var-sym {[dim-size]}+

Each of those three main directives supports one or more of the following operations:

  • alloc
  • copyin
  • copyout
  • free
  • remove

Most of operations are self-explanatory except it is not entirely clear why global is paired with free while others are paired with remove.

Weaknesses

Although hiCUDA has done an excellent job in providing an abstraction layer for GPU programming, there are still some room for improvement.

  • C/C++ only; a different approach is required for other languages with no preprocessor.
  • Code inside a kernel region must be primitive; this prevents programmers to fully utilize rich libraries and frameworks available on the host.
  • Texture memory cannot be accessed.