The Community for Technology Leaders
RSS Icon
Issue No.03 - March (2008 vol.19)
pp: 299-310
The widespread usage of the DiscreteWaveletTransform (DWT) has motivated the development of fastDWT algorithms and their tuning on all sorts of computersystems. Several studies have compared the performanceof the most popular schemes, known as Filter Bank(FBS) and Lifting (LS), and have always concluded thatLifting is the most efficient option. However, there isno such study on streaming processors such as modernGraphic Processing Units (GPUs). Current trends havetransformed these devices into powerful stream processorswith enough flexibility to perform intensive and complexfloating-point calculations. The opportunities opened upby these platforms, as well as the growing popularityof the DWT within the computer graphics field, make anew performance comparison of great practical interest.Our study indicates that FBS outperforms LS in currentgeneration GPUs. In our experiments, the actual FBS gainsrange between 10% and 140%, depending on the problemsize and the type and length of the wavelet filter. Moreover,design trends suggest higher gains in future generationGPUs.
Graphics processors, Parallelprocessing, Parallel algorithms, Paralleland vector implementations, Wavelets and fractals, SIMD processors, Optimization
Javier Setoain, Manuel Prieto, Luis Piñuel, Francisco Tirado, "Parallel Implementation of the 2D Discrete Wavelet Transform on Graphics Processing Units: Filter Bank versus Lifting", IEEE Transactions on Parallel & Distributed Systems, vol.19, no. 3, pp. 299-310, March 2008, doi:10.1109/TPDS.2007.70716
[1] T.R. Halfhill, “Number Crunching with GPUs,” In-Stat Microprocessor Report 10/2/06-01, Oct. 2006.
[2] J.D. Owens, D. Luebke, N. Govindaraju, M. Harris, J. Kruger, A.E. Lefohn, and T.J. Purcell, “A Survey of General Purpose Computation on Graphics Hardware,” State of the Art Reports—Proc. 26th Ann. Conf. European Assoc. Computer Graphics (Eurographics '05), pp. 21-51, 2005.
[3] General Purpose Computations on GPU's, http:/www.gpgpu. org, 2007.
[4] S.G. Mallat, “A Theory for Multiresolution Signal Decomposition: The Wavelet Representation,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 11, no. 7, pp. 674-693, July 1989.
[5] W. Sweldens, “The Lifting Scheme: A Construction of Second Generation Wavelets,” SIAM J. Math. Analysis, vol. 29, no. 2, pp.511-546, , 1998.
[6] I. Daubechies and W. Sweldens, “Factoring Wavelet Transforms into Lifting Steps,” J. Fourier Analysis and Applications, vol. 4, no. 3, pp. 245-267, 1998.
[7] S. Gnavi, B. Penna, M. Grangetto, E. Magli, and G. Olmo, “Wavelet Kernels on a DSP: A Comparison between Lifting and Filter Banks for Image Coding,” EURASIP J. Applied Signal Processing, vol. 2002, no. 9, pp. 981-989, 2002.
[8] K.A. Kotteri, S. Barua, A.E. Bell, and J.E. Carletta, “A Comparison of Hardware Implementations of the Biorthogonal 9/7 DWT: Convolution versus Lifting,” IEEE Trans. Circuits and Systems II: Express Briefs, vol. 52, pp. 256-260, May 2005.
[9] M. Antonini, M. Barlaud, P. Mathieu, and I. Daubechies, “Image Coding Using Wavelet Transform,” IEEE Trans. Image Processing, vol. 1, no. 2, pp. 205-220, Apr. 1992.
[10] R. DeVore, B. Jawerth, and B.J. Lucier, “Image Compression through Wavelet Transform Coding,” IEEE Trans. Information Theory, vol. 38, pp. 719-746, 1992.
[11] D. Santa Cruz, T. Ebrahimi, and C. Christopoulos, “The JPEG 2000 Image Coding Standard,” Dr. Dobb's J., vol. 26, no. 4, pp. 46-54, Apr. 2001.
[12] H. Choi and R. Baraniuk, “Multiple Wavelet Basis Image Denoising Using Besov Ball Projections,” IEEE Signal Processing Letters, vol. 11, no. 9, Sept. 2004.
[13] Z. Zhang and R. Blum, “Multisensor Image Fusion Using a Region-Based Wavelet Transform Approach,” Proc. DARPA Image Understanding Workshop (IUW), pp. 1447-1451, 1997.
[14] S. Arivazhagan and L. Ganesan, “Texture Segmentation Using Wavelet Transform,” Pattern Recognition Letters, vol. 24, no. 16, pp.3197-3203, 2003.
[15] L. Zhang and P. Bao, “Edge Detection by Scale Multiplication in Wavelet Domain,” Pattern Recognition Letters, vol. 23, no. 14, pp.1771-1784, 2002.
[16] H.-W. Park and H.-S. Kim, “Motion Estimation Using Low-Band-Shift Method for Wavelet-Based Moving-Picture Coding,” IEEE Trans. Image Processing, vol. 9, pp. 577-587, Apr. 2000.
[17] P. Schröder, W. Sweldens, M. Cohen, T. DeRose, and D. Salesin, “Wavelets in Computer Graphics,” Proc. ACM SIGGRAPH '96 Course Notes, 1996.
[18] S.B. Chatterjee, “Cache-Efficient Wavelet Lifting in JPEG 2000,” Proc. IEEE Int'l Conf. Multimedia and Expo, pp. 797-800, 2002.
[19] P. Meerwald, R. Norcen, and A. Uhl, “Cache Issues with JPEG2000 Wavelet Lifting,” Proc. SPIE Electronic Imaging, Visual Comm. and Image Processing '02, vol. 4671, http://www.cosy. vcip02/, Jan. 2002.
[20] A. Shahbahrami, B. Juurlink, and S. Vassiliadis, “Improving the Memory Behavior of Vertical Filtering in the Discrete Wavelet Transform,” Proc. Third Conf. Computing Frontiers (CF '06), pp. 253-260, 2006.
[21] D. Chaver, C. Tenllado, L. Piñuel, M. Prieto, and F. Tirado, “Vectorization of the 2D Wavelet Lifting Transform Using SIMD Extensions,” Proc. 17th IEEE Int'l Parallel and Distributed Processing Symp. (IPDPS), 2003.
[22] A. Shahbahrami, B. Juurlink, and S. Vassiliadis, “Performance Comparison of SIMD Implementations of the Discrete Wavelet Transform,” Proc. IEEE Int'l Conf. Application-Specific Systems, Architecture Processors (ASAP '05), pp. 393-398, 2005.
[23] R. Kutil, “A Single-Loop Approach to SIMD Parallelization of 2-D Wavelet Lifting,” Proc. 14th Euromicro Int'l Conf. Parallel, Distributed, and Network-Based Processing (PDP '06), pp. 413-420, 2006.
[24] M. Prieto, I.M. Llorente, and F. Tirado, “Data Locality Exploitation in the Decomposition of Regular Domain Problems,” IEEE Trans. Parallel and Distributed Systems, vol. 11, pp. 1141-1150, 2000.
[25] O. Nielsen and M. Hegland, “Parallel Performance of Fast Wavelet Transforms,” J. High Speed Computing, 2000.
[26] P. Meerwald, R. Norcen, and A. Uhl, “Parallel JPEG2000 Image Coding on Multiprocessors,” Proc. 16th Int'l Parallel and Distributed Processing Symp. (IPDPS '02), , Apr. 2002.
[27] D. Chaver, M. Prieto, L. Piñuel, and F. Tirado, “Parallel Wavelet Transform for Large-Scale Image Processing,” Proc. 16th Int'l Parallel and Distributed Processing Symp. (IPDPS), 2002.
[28] M. Grangetto, E. Magli, M. Martina, and G. Olmo, “Optimization and Implementation of the Integer Wavelet Transform for Image Coding,” IEEE Trans. Image Processing, vol. 11, no. 6, pp. 596-604, 2002.
[29] S. Barua, J.E. Carletta, K.A. Kotteri, and A.E. Bell, “An Efficient Architecture for Lifting-Based Two-Dimensional Discrete Wavelet Transforms,” Integration—The VLSI J., vol. 38, no. 3, pp. 341-352, 2005.
[30] T. Acharya and C. Chakrabarti, “A Survey on Lifting-Based Discrete Wavelet Transform Architectures,” J. VLSI Signal Processing Systems, vol. 42, no. 3, pp. 321-339, 2006.
[31] G. Shen, G. Ping Gao, S. Li, H.-Y. Shum, and Y.-Q. Zhang, “Accelerate Video Decoding with Generic GPU,” IEEE Trans. Circuits and Systems for Video Technology, vol. 15, no. 5, pp. 685-693, 2005.
[32] K. Moreland and E. Angel, “The FFT on a GPU,” Proc. ACM SIGGRAPH/EUROGRAPHICS Conf. Graphics Hardware (HWWS '03), pp. 112-119, 2003.
[33] M. Hopf and T. Ertl, “Hardware-Accelerated Wavelet Transformations,” Proc. EG/IEEE TVCG Symp. Visualization (SisSym '00), pp. 93-103, May 2000.
[34] J. Wang, T.T. Wong, P.A. Heng, and C.S. Leung, “Discrete Wavelet Transform on GPU,” Proc. ACM Workshop General-Purpose Computing on Graphics Processors, 2004.
[35] J. Setoain, C. Tenllado, M. Prieto, D. Valencia, A. Plaza, and J. Plaza, “Parallel Hyperspectral Image Processing on Commodity Graphics Hardware,” Proc. Int'l Conf. Workshops on Parallel Processing (ICPPW '06), pp. 465-472, 2006.
[36] F. Xu and K. Mueller, “Towards a Unified Framework for Rapid 3D Computed Tomography on Commodity GPUs,” Proc. IEEE Medical Imaging Conf., 2003.
[37] A. Griesser, S.D. Roeck, A. Neubeck, and L.V. Gool, “GPU-Based Foreground-Background Segmentation Using an Extended Colinearity Criterion,” Proc. 10th Fall Workshop Vision, Modeling, and Visualization (VMV), 2005.
[38] R. Strzodka, M. Droske, and M. Rumpf, “Image Registration by a Regularized Gradient Flow—A Streaming Implementation in DX9 Graphics Hardware,” Computing, vol. 73, no. 4, pp. 373-389, 2004.
[39] J. Cornwall, O. Beckmann, and P. Kelly, “Accelerating a C++ Image Processing Library with a GPU,” Proc. 19th IPDPS Workshop Performance Optimization for High-Level Languages and Libraries (POHLL), 2006.
[40] J. Fung and S. Mann, “OpenVIDIA: Parallel GPU Computer Vision,” Proc. 13th Ann. ACM Int'l Conf. Multimedia (MULTIMEDIA '05), pp. 849-852, 2005.
[41] C. Tenllado, R. Lario, M. Prieto, and F. Tirado, “The 2D Discrete Wavelet Transform on Programmable Graphics Hardware,” Proc. Fourth IASTED Int'l Conf. Visualization, Imaging, and Image Processing (VIIP '04), pp. 808-813, 2004.
[42] J. Foley, A. van Dam, S.K. Feiner, and J.F. Hughes, Computer Graphics: Principles and Practice, second ed. Addison-Wesley, 1996.
[43] J. Montrym and H. Moreton, “The GeForce 6800,” IEEE Micro Magazine, vol. 25, no. 2, pp. 41-51, 2005.
[44] R. Fernando and M.J. Kilgard, The Cg Tutorial: The Definitive Guide to Programmable Real-Time Graphics. Addison-Wesley Longman Publishing, 2003.
[45] V. Moya, C. Gonzalez, J. Roca, A. Fernandez, and R. Espasa, “Shader Performance Analysis on a Modern GPU Architecture,” Proc. 38th Ann. ACM/IEEE Int'l Symp. Microarchitecture (MICRO '05), pp. 355-364, 2005.
[46] Z.S. Hakura and A. Gupta, “The Design and Analysis of a Cache Architecture for Texture Mapping,” Proc. 24th Ann. Int'l Symp. Computer Architecture (ISCA '97), pp. 108-120, 1997.
[47] I. Buck, T. Foley, D. Horn, J. Sugerman, K. Fatahalian, M. Houston, and P. Hanrahan, “Brook for GPUs: Stream Computing on Graphics Hardware,” ACM Trans. Graphics, vol. 23, no. 3, pp.777-786, 2004.
[48] D. Chaver, C. Tenllado, L. Piñuel, M. Prieto, and F. Tirado, “2D Wavelet Transform Enhancement on General-Purpose Microprocessors: Memory Hierarchy and SIMD Parallelism Exploitation,” Proc. Ninth IEEE Int'l Symp. High-Performance Computing (HiPC '02), pp. 9-21, 2002.
[49] J. McGregor, “Fusion Integrates Graphics on x86,” In-Stat Microprocessor Report 12/4/06-01, Dec. 2006.
28 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool