The Community for Technology Leaders
RSS Icon
Issue No.01 - January (2012 vol.23)
pp: 177-184
R. Guerrieri , Adv. Res. Center on Electron. Syst. for Inf. & Commun. Technol., E. De Castro (ARCES), Bologna, Italy
T. De Marco , Adv. Res. Center on Electron. Syst. for Inf. & Commun. Technol., E. De Castro (ARCES), Bologna, Italy
F. Ries , Adv. Res. Center on Electron. Syst. for Inf. & Commun. Technol., E. De Castro (ARCES), Bologna, Italy
Dense matrix inversion is a basic procedure in many linear algebra algorithms. Any factorization-based dense matrix inversion algorithm involves the inversion of one or two triangular matrices. In this work, we present an improved implementation of a parallel triangular matrix inversion for heterogeneous multicore CPU/dual-GPU systems.
parallel processing, computer graphic equipment, coprocessors, linear algebra, matrix inversion, multiprocessing systems, heterogeneous dual-GPU system, linear algebra algorithm, factorization-based dense matrix inversion algorithm, parallel triangular matrix inversion, heterogeneous multicore CPU system, Graphics processing unit, Instruction sets, Kernel, Random access memory, Indexes, Parallel processing, Table lookup, parallel processing., Matrix inversion
R. Guerrieri, T. De Marco, F. Ries, "Triangular Matrix Inversion on Heterogeneous Multicore Systems", IEEE Transactions on Parallel & Distributed Systems, vol.23, no. 1, pp. 177-184, January 2012, doi:10.1109/TPDS.2011.103
[1] N. Galoppo, N.K. Govindaraju, M. Henson, and D. Manocha, “LU-GPU: Efficient Algorithms for Solving Dense Linear Systems on Graphics Hardware,” Proc. ACM/IEEE Conf. Supercomputing, p. 3, 2005.
[2] V. Volkov and J.W. Demmel, “Benchmarking GPUs to Tune Dense Linear Algebra,” Proc. ACM/IEEE Conf. Supercomputing, pp. 1-11, 2008.
[3] S. Barrachina, M. Castillo, F.D. Igual, R. Mayo, and E.S. Quintana-Ortí, “Solving Dense Linear Systems on Graphics Processors,” Proc. 14th Int'l Euro-Par Conf. Parallel Processing, pp. 739-748, 2008.
[4] S. Tomov, J. Dongarra, and M. Baboulin, “Towards Dense Linear Algebra for Hybrid GPU Accelerated Manycore Systems,” Parallel Computing Parallel Matrix Algorithms and Applications, vol. 36, nos. 5/6, pp. 232-240, June 2010.
[5] H. Hallez, B. Vanrumste, R. Grech, J. Muscat, D. Clercq, A. Vergult, Y. D'Asseler, K.P. Camilleri, S.G. Fabri, S. Van Huffel, and I. Lemahieu, “Review on Solving the Forward Problem in EEG Source Analysis,” J. Neuroeng. and Rehabilitation, vol. 4, no. 46, 2007.
[6] R.E. Greenblatt, A. Ossadtchi, and M.E. Pflieger, “Local Linear Estimators for the Bioelectromagnetic Inverse Problem,” IEEE Trans. Signal Processing, vol. 53, no. 9, pp. 3403-3412, Sept. 2005.
[7] M.S. Bazaraa, H.D. Sherali, and C.M. Shetty, Nonlinear Programming: Theory and Algorithms, third ed. Wiley, 2006.
[8] N. Harvey, D. Lun, and P. Maymounkov, “Methods for Efficient Network Coding,” Proc. 44th Allerton Ann. Conf. Comm., Control, and Computing, 2006.
[9] F. Ries, T. Marco, M. Zivieri, and R. Guerrieri, “Triangular Matrix Inversion on Graphics Processing Unit,” Proc. Conf. High Performance Computing Networking, Storage and Analysis, pp. 1-10, 2009.
[10] W. Nasri and Z. Mahjoub, “Optimal Parallelization of a Recursive Algorithm for Triangular Matrix Inversion on MIMD Computers,” Parallel Computing, vol. 27, no. 13, pp. 1767-1782, 2001.
[11] L. Weiguo, B. Schmidt, G. Voss, and W. Müller-Wittig, “Streaming Algorithms for Biological Sequence Alignment on GPUs,” IEEE Trans. Parallel and Distributed Systems, vol. 18, no. 9, pp. 1270-1281, Sept. 2007.
[12] C. Tenllado, J. Setoain, M. Prieto, L. Pinuel, and F. Tirado, “Parallel Implementation of the 2D Discrete Wavelet Transform on Graphics Processing Units: Filter Bank versus Lifting,” IEEE Trans. Parallel and Distributed Systems, vol. 19, no. 3, pp. 299-310, Mar. 2008.
[13] NVIDIA Corporation, “NVIDIA CUDA Compute Unified Device Architecture,” Programming Guide, v. 2.2, 2008.
[14] D. Heller, “A Survey of Parallel Algorithms in Numerical Linear Algebra,” SIAM Rev., vol. 20, pp. 740-777, 1978.
[15] Y. Robert, The Impact of Vector and Parallel Architectures on the Gaussian Elimination Algorithm. Halstead Press, 1990.
[16] V. Strassen, “Gaussian Elimination Is Not Optimal,” Numerical Math., vol. 13, pp. 354-356, 1969.
[17] S.M. Balle, P.C. Hansen, and N. Higham, “A Strassen-Type Matrix Inversion Algorithm,” Advances in Parallel Algorithms, pp. 22-30, IOS Press, 1994.
[18] E. Lindholm, J. Nickolls, S. Oberman, and J. Montrym, “NVIDIA Tesla: A Unified Graphics and Computing Architecture,” IEEE Micro, vol. 28, no. 2, pp. 39-55, Mar./Apr. 2008.
[19] L. Dagum and R. Menon, “OpenMP: An Industry Standard API for Shared-Memory Programming,” IEEE Computational Science and Eng., vol. 5, no. 1, pp. 46-55, Jan.-Mar. 1998.
[20] E. Anderson, Z. Bai, C. Bischof, L.S. Blackford, J. Demmel, J. Dongarra, J.D. Croz, S. Hammarling, A. Greenbaum, A. McKenney, and D. Sorensen, LAPACK Users' Guide, third ed. SIAM, 1999.
211 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool