The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.02 - February (2011 vol.22)
pp: 337-351
Leila Ismail , UAE University, Al-Ain
Driss Guerchi , UAE University, Al-Ain
ABSTRACT
Convolution represents a major computational load for many scientific and engineering applications, including seismic surface simulations and seismic imaging. Since convolution presents a heavy computational load, increasing its efficiency can significantly enhance the performance of associated applications. In this work, we present an in-depth analysis of the convolution algorithm and its complexity in order to develop adequate parallel algorithms. The implementation of these algorithms and their evaluation on the IBM Cell Broadband Engine (BE) processor reveals the gains and losses achieved by parallelizing the direct convolution. The performance results show that despite the complexity of the convolution processing, a speedup gain of at least 71.4 is obtained. The parallel vectorized algorithm requires the development effort of considering three independent vectorization strategies. Given the wide availability of Cell processors, the proposed parallelization approach can be widely adopted by any convolution-based application.
INDEX TERMS
Parallel computing, IBM Cell BE, convolution, performance.
CITATION
Leila Ismail, Driss Guerchi, "Performance Evaluation of Convolution on the Cell Broadband Engine Processor", IEEE Transactions on Parallel & Distributed Systems, vol.22, no. 2, pp. 337-351, February 2011, doi:10.1109/TPDS.2010.70
REFERENCES
[1] S.K. Mitra, Digital Signal Processing, A Computer-Based Approach, second ed., pp. 149-155, McGraw Hill, 2002.
[2] C.A. Rowe, R.C. Aster, B. Borchers, and C.J. Young, "An Automatic, Adaptive Algorithm for Refining Phase Picks in Large Seismic Data Sets," Bull. of the Seismological Soc. of America, vol. 92, pp. 1660-1674, 2002.
[3] R. Gonzalez, R. Woods, and S. Eddins, Digital Image Processing Using Matlab. Prentice Hall, 2003.
[4] A. Merigot and A. Petrosino, "Parallel Processing for Image and Video Processing: Issues and Challenges," Parallel Computing, vol. 34, no. 12, pp. 694-699, Dec. 2008.
[5] K. Diethelm and A.D. Freed, "An Efficient Algorithm for the Evaluation of Convolution Integrals," Computers and Math. with Applications, vol. 51, no. 1, pp. 51-72, Jan. 2006.
[6] M. Aritsugi, H. Fukatsu, and Y. Kanamori, "Several Partitioning Strategies for Parallel Image Convolution in a Network of Heterogeneous Workstations," Parallel Computing, vol. 27, no. 3, pp. 269-293, Feb. 2001.
[7] H.-M. Yip, I. Ahmad, and T.-C. Pong, "An Efficient Parallel Algorithm for Computing the Gaussian Convolution of Multi-Dimensional Image Data," J. Supercomputing, vol. 14, pp. 233-255, 1999.
[8] F. Almeida, V. Blanco, C. Delgado, F. de Sande, and A. Santos, "IDEWEP: Web Service for Astronomical Parallel Image Deconvolution," J. Network and Computer Applications, vol. 32, no. 1, pp. 293-313, Jan. 2009.
[9] Seismic Un∗x, Colorado School of Mines, http://www.cwp. mines.educwpcodes/, 2010.
[10] http://www.research.ibm.comcell, 2010.
[11] IBM Corporation, "Cell BE Programming Tutorial," http://www-01.ibm.com/chips/techlib/techlib.nsf/ techdocsFC857AE550F7EB83872571A80061F788 , 2010.
[12] IBM Corporation, "Cell Broadband Engine Programming Handbook," http://www-01.ibm.com/chips/techlib/techlib.nsf/ techdocs9F820A5FFA3ECE8C8725716A0062585F , 2010.
[13] M.D. McCool, "Data-Parallel Programming on the Cell BE and the GPU Using RapidMind Development Platform," Proc. GSPx Multicore Applications Conf., Oct./Nov. 2006.
[14] M. Gschwind, H. Hosfstee, B. Flachs, M. Hopkins, Y. Watanabe, and T. Yamazaki, "Synergistic Processing in Cells Multicore Architecture," IEEE Micro, vol. 26, no. 2, pp. 10-24, Mar./Apr. 2006.
[15] A.C. Chow, G.C. Fossum, and D.A. Brokenshire, "A Programming Example: Large FFT on the Cell Broadband Engine," Proc. Global Signal Processing Expo, 2005.
[16] S. Williams, J. Shalf, L. Oliker, S. Kamil, P. Husbands, and K. Yelick, "The Potential of the Cell Processor for Scientific Computing," Proc. Third Conf. Computing Frontiers, pp. 2-20, 2006.
[17] B.A. David and V. Agarwal, "FFTC: Fastest Fourier Transform for the IBM Cell Broadband Engine," Proc. Global Signal Processing Expo, 2005.
[18] A. Arevalo, R. Matinata, M. Pandian, E. Peri, K. Ruby, F. Thomas, and C. Almond, Programming the Cell Broadbanc Engine Architecture: Examples and Best Practices. IBM Readbooks, Aug. 2008.
[19] Power Architecture Editors (dwpower@us.ibm.com), "An Introduction to Compiling for the Cell Broadband Engine Architecture, Part 1: A Bird's-Eye View," Developer Works, IBM, Feb. 2006.
[20] P. Seebach, "The Littel Broadband Engine That Could. Why Is My Scalar Code So Slow?" Developer Works, IBM, http://www.ibm. com/developerworks/library pa-tacklecell3/, 2010.
58 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool