Issue No. 01 - January (2011 vol. 22)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TPDS.2010.143
Wladimir J. van der Laan , University of Groningen, Groningen
Andrei C. Jalba , Eindhoven University of Technology, Eindhoven
Jos B.T.M. Roerdink , University of Groningen, Groningen
The Discrete Wavelet Transform (DWT) has a wide range of applications from signal processing to video and image compression. We show that this transform, by means of the lifting scheme, can be performed in a memory and computation-efficient way on modern, programmable GPUs, which can be regarded as massively parallel coprocessors through NVidia's CUDA compute paradigm. The three main hardware architectures for the 2D DWT (row-column, line-based, block-based) are shown to be unsuitable for a CUDA implementation. Our CUDA-specific design can be regarded as a hybrid method between the row-column and block-based methods. We achieve considerable speedups compared to an optimized CPU implementation and earlier non-CUDA-based GPU DWT methods, both for 2D images and 3D volume data. Additionally, memory usage can be reduced significantly compared to previous GPU DWT methods. The method is scalable and the fastest GPU implementation among the methods considered. A performance analysis shows that the results of our CUDA-specific design are in close agreement with our theoretical complexity analysis.
Discrete wavelet transform, wavelet lifting, graphics hardware, CUDA.
J. B. Roerdink, W. J. van der Laan and A. C. Jalba, "Accelerating Wavelet Lifting on Graphics Hardware Using CUDA," in IEEE Transactions on Parallel & Distributed Systems, vol. 22, no. , pp. 132-146, 2010.