This Article 
 Bibliographic References 
 Add to: 
hiCUDA: High-Level GPGPU Programming
January 2011 (vol. 22 no. 1)
pp. 78-90
Tianyi David Han, University of Toronto, Toronto
Tarek S. Abdelrahman, University of Toronto, Toronto
Graphics Processing Units (GPUs) have become a competitive accelerator for applications outside the graphics domain, mainly driven by the improvements in GPU programmability. Although the Compute Unified Device Architecture (CUDA) is a simple C-like interface for programming NVIDIA GPUs, porting applications to CUDA remains a challenge to average programmers. In particular, CUDA places on the programmer the burden of packaging GPU code in separate functions, of explicitly managing data transfer between the host and GPU memories, and of manually optimizing the utilization of the GPU memory. Practical experience shows that the programmer needs to make significant code changes, often tedious and error-prone, before getting an optimized program. We have designed hi{\rm CUDA}, a high-level directive-based language for CUDA programming. It allows programmers to perform these tedious tasks in a simpler manner and directly to the sequential code, thus speeding up the porting process. In this paper, we describe the hi{\rm CUDA} directives as well as the design and implementation of a prototype compiler that translates a hi{\rm CUDA} program to a CUDA program. Our compiler is able to support real-world applications that span multiple procedures and use dynamically allocated arrays. Experiments using nine CUDA benchmarks show that the simplicity hi{\rm CUDA} provides comes at no expense to performance.

[1] NVIDIA, "NVIDIA GeForce 8800 GPU Architecture Overview," , Nov. 2006.
[2] NVIDIA, "NVIDIA CUDA Programming Guide v1.1," cuda/1_1NVIDIA_ CUDA_Programming_Guide_1.1.pdf , Nov. 2007.
[3] I. Buck et al., "Brook for GPUs: Stream Computing on Graphics Hardware," Proc. ACM SIGGRAPH, pp. 777-786, 2004.
[4] "Open Computing Language (OpenCL)," http://www.khronos. orgopencl/, 2010.
[5] S. Ryoo et al., "Optimization Principles and Application Performance Evaluation of a Multithreaded GPU Using CUDA," Proc. Symp. Principles and Practice of Parallel Programming, pp. 73-82, 2008.
[6] NVIDIA, "The CUDA Compiler Driver NVCC v1.1," , 2007.
[7] S. Ryoo et al., "Program Optimization Space Pruning for a Multithreaded GPU," Proc. Int'l Symp. Code Generation and Optimization, pp. 195-204, 2008.
[8] ISO( )14882:2003, "Information Technology—Programming Languages—C++," ISO, 2003.
[9] T.D. Han, "Directive-Based General-Purpose GPU Programming," master's thesis, Univ. of Toronto, Sept. 2009.
[10] C. Liao et al., "Effective Source-to-Source Outlining to Support Whole Program Empirical Optimization," Proc. Int'l Workshop Languages and Compilers for Parallel Computing, Oct. 2009.
[11] S. Muchnick, Advanced Compiler Design and Implementation. Morgan Kaufmann, 1997.
[12] J. Fabri, "Automatic Storage Optimization," Proc. Symp. Compiler Construction, pp. 83-91, 1979.
[13] A. Clementson and C. Elphick, "Approximate Coloring Algorithms for Composite Graphs," J. Operational Research Soc., vol. 34, no. 6, pp. 503-509, 1983.
[14] "Open64 Research Compiler," http:/, 2010.
[15] IMPACT Research Group, "The Parboil Benchmark Suite," , 2007.
[16] L. Wang, S. Jacques, and L. Zheng, "MCML—Monte Carlo Modeling of Light Transport in Multi-Layered Tissues," Computer Methods and Programs in Biomedicine, vol. 47, no. 2, pp. 131-146, 1995.
[17] A. Klockner, "Pycuda v0.94beta Documentation," http:// documen.tician.depycuda/, 2010.
[18] GASS, "jCUDA: Java for CUDA," /, 2010.
[19] The Portland Group, "CUDA Fortran Programming Guide and Reference v0.9," , June 2009.
[20] S. Lee, S.-J. Min, and R. Eigenmann, "OpenMP to GPGPU: A Compiler Framework for Automatic Translation and Optimization," Proc. Symp. Principles and Practice of Parallel Programming, pp. 101-110, 2009.
[21] OpenMP ARB, "OpenMP Specification v3.0," http://openmp. org/wpopenmp-specifications /, May 2008.
[22] The Portland Group, "PGI Fortran and C Accelerator Programming Model," accel_prog_model_1.0.pdf , June 2009.
[23] S.-Z. Ueng et al., "CUDA-lite: Reducing GPU Programming Complexity," Proc. Int'l Workshop Languages and Compilers for Parallel Computing, pp. 1-15, 2008.
[24] C.-K. Luk, S. Hong, and H. Kim, "Qilin: Exploiting Parallelism on Heterogeneous Multiprocessors with Adaptive Mapping," Proc. Int'l Symp. Microarchitecture, pp. 45-55, 2009.
[25] M.M. Baskaran et al., "A Compiler Framework for Optimization of Affine Loop Nests for GPGPUs," Proc. Int'l Conf. Supercomputing, pp. 225-234, 2008.

Index Terms:
CUDA, GPGPU, data-parallel programming, directive-based language, source-to-source compiler.
Tianyi David Han, Tarek S. Abdelrahman, "hiCUDA: High-Level GPGPU Programming," IEEE Transactions on Parallel and Distributed Systems, vol. 22, no. 1, pp. 78-90, Jan. 2011, doi:10.1109/TPDS.2010.62
Usage of this product signifies your acceptance of the Terms of Use.