The Community for Technology Leaders
RSS Icon
Issue No.01 - January (2011 vol.22)
pp: 91-104
In Kyu Park , Inha University, Incheon
Nitin Singhal , Samsung Electronics Co., Ltd., Suwon
Man Hee Lee , Inha University, Incheon
Sungdae Cho , Samsung Electronics Co., Ltd., Suwon
Chris W. Kim , NVIDIA Corporation, Seoul
In this paper, we construe key factors in design and evaluation of image processing algorithms on the massive parallel graphics processing units (GPUs) using the compute unified device architecture (CUDA) programming model. A set of metrics, customized for image processing, is proposed to quantitatively evaluate algorithm characteristics. In addition, we show that a range of image processing algorithms map readily to CUDA using multiview stereo matching, linear feature extraction, JPEG2000 image encoding, and nonphotorealistic rendering (NPR) as our example applications. The algorithms are carefully selected from major domains of image processing, so they inherently contain a variety of subalgorithms with diverse characteristics when implemented on the GPU. Performance is evaluated in terms of execution time and is compared to the fastest host-only version implemented using OpenMP. It is shown that the observed speedup varies extensively depending on the characteristics of each algorithm. Intensive analysis is conducted to show the appropriateness of the proposed metrics in predicting the effectiveness of an application for parallel implementation.
GPU, CUDA, image processing, parallel implementation, GPGPU.
In Kyu Park, Nitin Singhal, Man Hee Lee, Sungdae Cho, Chris W. Kim, "Design and Performance Evaluation of Image Processing Algorithms on GPUs", IEEE Transactions on Parallel & Distributed Systems, vol.22, no. 1, pp. 91-104, January 2011, doi:10.1109/TPDS.2010.115
[1] General Purpose GPU Programming (GPGPU) Website, http:/, 2010.
[2] J.D. Owens, D. Luebke, N. Govindaraju, M. Harris, J. Krger, A.E. Lefohn, and T.J. Purcell, "A Survey of General-Purpose Computation on Graphics Hardware," Computer Graphics Forum, vol. 26, no. 1, pp. 80-113, Mar. 2007.
[3] J.D. Owens, M. Houston, D. Luebke, S. Green, J.E. Stone, and J.C. Phillips, "GPU Computing," Proc. IEEE, vol. 96, no. 5, pp. 879-899, May 2008.
[4] NVIDIA Corporation, Compute Unified Device Architecture (CUDA), , 2010.
[5] Khro nos Group, Open Computing Language (OpenCL), http://www.khronos.orgopencl/, 2010.
[6] OpenMP Website, http://openmp.orgwp/, 2010.
[7] I.K. Park, N. Singhal, M.H. Lee, and S. Cho, "Efficient Design and Implementation of Visual Computing Algorithms on the GPU," Proc. IEEE Int'l Conf. Image Processing, pp. 2321-2324, Nov. 2009.
[8] W.R. Mark, R.S. Glanville, K. Akeley, and M.J. Kilgard, "Cg: A System for Programming Graphics Hardware in a C-Like Language," ACM Trans. Graphics, vol. 22, no. 3, pp. 896-907, July 2003.
[9] M. Oneppo, "HLSL Shader Model 4.0," Proc. ACM SIGGRAPH '07, Aug. 2007.
[10] R.J. Rost, OpenGL(R) Shading Language, second ed. Addison-Wesley Professional, Jan. 2006.
[11] G. Shen, G.-P. Gao, S. Li, H. Shum, and Y. Zhang, "Accelerate Video Decoding with Generic GPU," IEEE Trans. Circuits and Systems for Video Technology, vol. 15, no. 5, pp. 685-693, May 2005.
[12] R. Yang and M. Pollefeys, "A Versatile Stereo Implementation on Commodity Graphics Hardware," Real-Time Imaging, vol. 11, no. 1, pp. 7-18, Feb. 2005.
[13] Y. Allusse, P. Horain, A. Agarwal, and C. Saipriyadarshan, "GpuCV: An Opensource Gpu-Accelerated Framework for Image Processing and Computer Vision," Proc. ACM Int'l Conf. Multimedia, pp. 1089-1092, Oct. 2008.
[14] P. Babenko and M. Shah, "MinGPU: A Minimum GPU Library for Computer Vision," Real-Time Image Processing, vol. 3, no. 4, pp. 255-268, Dec. 2008.
[15] J. Fung, S. Mann, and C. Aimone, "OpenVIDIA: Parallel GPU Computer Vision," Proc. ACM Int'l Conf. Multimedia, pp. 849-852, Nov. 2005.
[16] Proc. CVPR Workshop Visual Computer Vision on GPUs (CVGPU), J.-M. Frahm, M. Pollefeys, and M. Shah, eds., June 2008.
[17] S. Ryoo, C. Rodrigues, S. Baghsorkhi, S. Stone, D. Kirk, and W. Hwu, "Optimization Principles and Application Performance Evaluation of a Multithreaded GPU Using CUDA," Proc. ACM SIGPLAN Symp. Principles and Practice of Parallel Programming, pp. 73-82, Feb. 2008.
[18] CUJ2K: Jpeg2000 Encoder on Cuda, projectscuj2k/, 2010.
[19] M.D. Adams and F. Kossentini, "JasPer: A Software-Based JPEG-2000 Codec Implementation," Proc. IEEE Int'l Conf. Image Processing, pp. 53-56, Sept. 2000.
[20] NVIDIA Corporation, NVIDIA CUDA Programming Guide 2.3, 2009.
[21] T.R. Halffill, "Parallel Processing with CUDA," MicroProcessor Report Online, Jan. 2008.
[22] A. Grama, A. Gupta, G. Karypis, and V. Kumar, Introduction to Parallel Computing, second ed. Pearson Education Limited, 2003.
[23] Y. Furukawa and J. Ponce, "Accurate, Dense, and Robust Multi-View Stereopsis," Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 1-8, June 2007.
[24] R. Nevatia and K.R. Babu, "Linear Feature Extraction and Description," Computer Graphics and Image Processing, vol. 13, no. 3, pp. 257-269, July 1980.
[25] J.F. Canny, "A Computational Approach to Edge Detection," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. PAMI-8, no. 6, pp. 679-698, Nov. 1986.
[26] Information Technology—JPEG2000 Image Coding System, ISO/IEC Int'l Standard 15444-1, ITU Recommendation T.800, 2000.
[27] M. Rabbani and R. Joshi, "An Overview of the JPEG 2000 Still Image Compression Standard," Signal Processing: Image Comm., vol. 17, no. 1, pp. 3-48, Jan. 2002.
[28] D. Taubman, "High Performance Scalable Image Compression with Ebcot," IEEE Trans. Image Processing, vol. 9, no. 7, pp. 1158-1170, July 2000.
[29] K. Andra, C. Chakrabarti, and T. Acharya, "A VLSI Architecture for Lifting Based Forward and Inverse Wavelet Transform," IEEE Trans. Signal Processing, vol. 50, no. 4, pp. 966-977, Apr. 2002.
[30] C.-T. Huang, P.-C. Tseng, and L.-G. Chen, "Hardware Implementation of Shape-Adaptive Discrete Wavelet Transform with the JPEG Defaulted (9,7) Filter Bank," Proc. IEEE Int'l Conf. Image Processing, pp. 571-574, Sept. 2003.
[31] Y.-Z. Zhang, C. Xu, W.-T. Wang, and L.-B. Chen, "Performance Analysis and Architecture Design for Parallel EBCOT Encoder for JPEG2000," IEEE Trans. Circuits and Systems for Video Technology, vol. 17, no. 10, pp. 1336-1347, Oct. 2007.
[32] J.-S. Chaing, C.-H. Chang, C.-Y. Hsieh, and C.-H. Hsia, "High Efficiency EBCOT with Parallel Coding Architecture for JPEG2000," EURASIP J. Applied Signal Processing, vol. 2006, no. 1, pp. 1-14, Jan. 2006.
[33] S. Mallat, "The Theory for Multiresolution Signal Decomposition: The Wavelet Representation," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 11, no. 7, pp. 674-693, July 1989.
[34] W. Sweldens, "The Lifting Scheme: A Construction of Second Generation Wavelets," SIAM J. Math. Analysis, vol. 29, no. 2, pp. 511-546, Mar. 1998.
[35] G. Strang and T. Nguyen, Wavelets and Filter Banks. Cambridge Univ. Press, 1996.
[36] A. Hertzmann, "A Survey of Stroke-Based Rendering," IEEE Computer Graphics and Applications, vol. 23, no. 4, pp. 70-81, July/Aug. 2003.
[37] C. Tomasi and R. Manduchi, "Bilateral Filtering for Gray and Color Images," Proc. IEEE Int'l Conf. Computer Vision, pp. 839-846, Jan. 1998.
[38] A. Hertzmann, "Painterly Rendering with Curved Brush Strokes of Multiple Sizes," Proc. ACM SIGGRAPH, pp. 453-460, July 1998.
16 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool