Subscribe

Issue No.01 - January (2011 vol.22)

pp: 132-146

Andrei C. Jalba , Eindhoven University of Technology, Eindhoven

Jos B.T.M. Roerdink , University of Groningen, Groningen

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TPDS.2010.143

ABSTRACT

The Discrete Wavelet Transform (DWT) has a wide range of applications from signal processing to video and image compression. We show that this transform, by means of the lifting scheme, can be performed in a memory and computation-efficient way on modern, programmable GPUs, which can be regarded as massively parallel coprocessors through NVidia's CUDA compute paradigm. The three main hardware architectures for the 2D DWT (row-column, line-based, block-based) are shown to be unsuitable for a CUDA implementation. Our CUDA-specific design can be regarded as a hybrid method between the row-column and block-based methods. We achieve considerable speedups compared to an optimized CPU implementation and earlier non-CUDA-based GPU DWT methods, both for 2D images and 3D volume data. Additionally, memory usage can be reduced significantly compared to previous GPU DWT methods. The method is scalable and the fastest GPU implementation among the methods considered. A performance analysis shows that the results of our CUDA-specific design are in close agreement with our theoretical complexity analysis.

INDEX TERMS

Discrete wavelet transform, wavelet lifting, graphics hardware, CUDA.

CITATION

Andrei C. Jalba, Jos B.T.M. Roerdink, "Accelerating Wavelet Lifting on Graphics Hardware Using CUDA",

*IEEE Transactions on Parallel & Distributed Systems*, vol.22, no. 1, pp. 132-146, January 2011, doi:10.1109/TPDS.2010.143REFERENCES

- [1] M.A. Westenberg and J.B.T.M. Roerdink, "Frequency Domain Volume Rendering by the Wavelet X-Ray Transform,"
IEEE Trans. Image Processing, vol. 9, no. 7, pp. 1249-1261, July 2000.- [2] I. Daubechies,
Ten Lectures on Wavelets, vol. 61. Soc. for Industrial and Applied Math., 1992.- [3] S. Mallat,
A Wavelet Tour of Signal Processing. Academic Press, 1998.- [4] W. Sweldens, "The Lifting Scheme: A Construction of Second Generation Wavelets,"
SIAM J. Math. Analysis, vol. 29, no. 2, pp. 511-546, 1998.- [5] I. Daubechies and W. Sweldens, "Factoring Wavelet Transforms into Lifting Steps,"
J. Fourier Analysis and Applications, vol. 4, no. 3, pp. 247-269, 1998.- [6] A.R. Calderbank, I. Daubechies, W. Sweldens, and B.-L. Yeo, "Wavelet Transforms That Map Integers to Integers,"
Applied and Computational Harmonic Analysis, vol. 5, no. 3, pp. 332-369, 1998.- [7] E. Lindholm, J. Nickolls, S. Oberman, and J. Montrym, "NVIDIA Tesla: A Unified Graphics and Computing Architecture,"
IEEE Micro, vol. 28, no. 2, pp. 39-55, Mar./Apr. 2008.- [8] M. Hopf and T. Ertl, "Hardware Accelerated Wavelet Transformations,"
Proc. EG/IEEE TCVG Symp. Visualization (VisSym '00), pp. 93-103, May 2000.- [9] A. Garcia and H.W. Shen, "GPU-Based 3D Wavelet Reconstruction with Tileboarding,"
The Visual Computer, vol. 21, pp. 755-763, 2005.- [10] T.T. Wong, C.S. Leung, P.A. Heng, and J. Wang, "Discrete Wavelet Transform on Consumer-Level Graphics Hardware,"
IEEE Trans. Multimedia, vol. 9, no. 3, pp. 668-673, Apr. 2007.- [11] C. Tenllado, R. Lario, M. Prieto, and F. Tirado, "The 2D Discrete Wavelet Transform on Programmable Graphics Hardware,"
Proc. Fouth IASTED Int'l Conf. Visualization, Imaging, and Image Processing, 2004.- [12] C. Tenllado, J. Setoain, M. Prieto, L. Piñuel, and F. Tirado, "Parallel Implementation of the 2D Discrete Wavelet Transform on Graphics Processing Units: Filter Bank versus Lifting,"
IEEE Trans. Parallel and Distributed Systems, vol. 19, no. 3, pp. 299-310, Mar. 2008.- [13] W. Jiang and A. Ortega, "Parallel Architecture for the Discrete Wavelet Transform Based on the Lifting Factorization,"
J. Parallel and Distributed Computing, vol. 57, no. 2, pp. 257-269, 1999.- [14] M. Angelopoulou, K. Masselos, P. Cheung, and Y. Andreopoulos, "Implementation and Comparison of the 5/3 Lifting 2d Discrete Wavelet Transform Computation Schedules on FPGAs,"
J. Signal Processing Systems, vol. 51, no. 1, pp. 3-21, 2008.- [15] N.D. Zervas, G.P. Anagnostopoulos, V. Spiliotopoulos, and Y. Andreopoulos, "Evaluation of Design Alternatives for the 2D Discrete Wavelet Transform,"
IEEE Trans. Circuits and Systems for Video Technology, vol. 11, no. 12, pp. 1246-1262, Dec. 2001.- [16] M.Y. Chiu, K.-B. Lee, and C.-W. Jen, "Optimal Data Transfer and Buffering Schemes for JPEG 20000 Encoder,"
Proc. IEEE Workshop Design and Implementation of Signal Processing Systems, pp. 177-182, 2003.- [17] T. Acharya and C. Chakrabarti, "A Survey on Lifting-Based Discrete Wavelet Transform Architectures,"
J. VLSI Signal Processing Systems, vol. 42, no. 3, pp. 321-339, 2006.- [18] NVidia,
CUDA Occupancy Calculator, http://developer.download. nvidia.com/compute/ cudaCUDA_Occupancy_calculator.xls , 2009.- [19] NVIDIA Corporation,
Compute Unified Device Architecture Programming Guide, http://developer.nvidia.comcuda, 2010.- [20] V. Volkov and J. Demmel, "LU, QR and Cholesky Factorizations Using Vector Capabilities of GPUs," technical report, Univ. of California at Berkeley, 2008.
- [21] S. Chatterjee and C.D. Brooks, "Cache-Efficient Wavelet Lifting in JPEG 2000,"
Proc. IEEE Int'l Conf. Multimedia and Expo, pp. 797-800, 2002.- [22] G. Deslauriers and S. Dubuc, "Symmetric Iterative Interpolation Processes,"
Constructive Approximation, vol. 5, no. 1, pp. 49-68, Dec. 1989.- [23] D. LeGall and A. Tabatabai, "Sub-Band Coding of Digital Images Using Symmetric Short Kernel Filters and Arithmetic Coding Techniques,"
Proc. IEEE Int'l Conf. Acoustics, Speech and Signal Processing, vol. 2, pp. 761-764, 1988.- [24] BBC Research,
Dirac Specification 1.0.0pre7, http://dirac. sourceforge.netspecification.html , 2010.- [25] A.N. Skodras, C.A. Christopoulos, and T. Ebrahimi, "JPEG2000: The Upcoming Still Image Compression Standard,"
Pattern Recognition Letters, vol. 22, no. 12, pp. 1337-1345, 2001.- [26] W.J. van der Laan, A.C. Jalba, and J.B.T.M. Roerdink, "Accelerating Wavelet-Based Video Coding on Graphics Hardware Using CUDA,"
Proc. Sixth Int'l Symp. Image and Signal Processing and Analysis (ISPA '09), pp. 614-619, Sept. 2009. |