This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Triangular Matrix Inversion on Heterogeneous Multicore Systems
January 2012 (vol. 23 no. 1)
pp. 177-184
R. Guerrieri, Adv. Res. Center on Electron. Syst. for Inf. & Commun. Technol., E. De Castro (ARCES), Bologna, Italy
T. De Marco, Adv. Res. Center on Electron. Syst. for Inf. & Commun. Technol., E. De Castro (ARCES), Bologna, Italy
F. Ries, Adv. Res. Center on Electron. Syst. for Inf. & Commun. Technol., E. De Castro (ARCES), Bologna, Italy
Dense matrix inversion is a basic procedure in many linear algebra algorithms. Any factorization-based dense matrix inversion algorithm involves the inversion of one or two triangular matrices. In this work, we present an improved implementation of a parallel triangular matrix inversion for heterogeneous multicore CPU/dual-GPU systems.

[1] N. Galoppo, N.K. Govindaraju, M. Henson, and D. Manocha, “LU-GPU: Efficient Algorithms for Solving Dense Linear Systems on Graphics Hardware,” Proc. ACM/IEEE Conf. Supercomputing, p. 3, 2005.
[2] V. Volkov and J.W. Demmel, “Benchmarking GPUs to Tune Dense Linear Algebra,” Proc. ACM/IEEE Conf. Supercomputing, pp. 1-11, 2008.
[3] S. Barrachina, M. Castillo, F.D. Igual, R. Mayo, and E.S. Quintana-Ortí, “Solving Dense Linear Systems on Graphics Processors,” Proc. 14th Int'l Euro-Par Conf. Parallel Processing, pp. 739-748, 2008.
[4] S. Tomov, J. Dongarra, and M. Baboulin, “Towards Dense Linear Algebra for Hybrid GPU Accelerated Manycore Systems,” Parallel Computing Parallel Matrix Algorithms and Applications, vol. 36, nos. 5/6, pp. 232-240, June 2010.
[5] H. Hallez, B. Vanrumste, R. Grech, J. Muscat, D. Clercq, A. Vergult, Y. D'Asseler, K.P. Camilleri, S.G. Fabri, S. Van Huffel, and I. Lemahieu, “Review on Solving the Forward Problem in EEG Source Analysis,” J. Neuroeng. and Rehabilitation, vol. 4, no. 46, 2007.
[6] R.E. Greenblatt, A. Ossadtchi, and M.E. Pflieger, “Local Linear Estimators for the Bioelectromagnetic Inverse Problem,” IEEE Trans. Signal Processing, vol. 53, no. 9, pp. 3403-3412, Sept. 2005.
[7] M.S. Bazaraa, H.D. Sherali, and C.M. Shetty, Nonlinear Programming: Theory and Algorithms, third ed. Wiley, 2006.
[8] N. Harvey, D. Lun, and P. Maymounkov, “Methods for Efficient Network Coding,” Proc. 44th Allerton Ann. Conf. Comm., Control, and Computing, 2006.
[9] F. Ries, T. Marco, M. Zivieri, and R. Guerrieri, “Triangular Matrix Inversion on Graphics Processing Unit,” Proc. Conf. High Performance Computing Networking, Storage and Analysis, pp. 1-10, 2009.
[10] W. Nasri and Z. Mahjoub, “Optimal Parallelization of a Recursive Algorithm for Triangular Matrix Inversion on MIMD Computers,” Parallel Computing, vol. 27, no. 13, pp. 1767-1782, 2001.
[11] L. Weiguo, B. Schmidt, G. Voss, and W. Müller-Wittig, “Streaming Algorithms for Biological Sequence Alignment on GPUs,” IEEE Trans. Parallel and Distributed Systems, vol. 18, no. 9, pp. 1270-1281, Sept. 2007.
[12] C. Tenllado, J. Setoain, M. Prieto, L. Pinuel, and F. Tirado, “Parallel Implementation of the 2D Discrete Wavelet Transform on Graphics Processing Units: Filter Bank versus Lifting,” IEEE Trans. Parallel and Distributed Systems, vol. 19, no. 3, pp. 299-310, Mar. 2008.
[13] NVIDIA Corporation, “NVIDIA CUDA Compute Unified Device Architecture,” Programming Guide, v. 2.2, 2008.
[14] D. Heller, “A Survey of Parallel Algorithms in Numerical Linear Algebra,” SIAM Rev., vol. 20, pp. 740-777, 1978.
[15] Y. Robert, The Impact of Vector and Parallel Architectures on the Gaussian Elimination Algorithm. Halstead Press, 1990.
[16] V. Strassen, “Gaussian Elimination Is Not Optimal,” Numerical Math., vol. 13, pp. 354-356, 1969.
[17] S.M. Balle, P.C. Hansen, and N. Higham, “A Strassen-Type Matrix Inversion Algorithm,” Advances in Parallel Algorithms, pp. 22-30, IOS Press, 1994.
[18] E. Lindholm, J. Nickolls, S. Oberman, and J. Montrym, “NVIDIA Tesla: A Unified Graphics and Computing Architecture,” IEEE Micro, vol. 28, no. 2, pp. 39-55, Mar./Apr. 2008.
[19] L. Dagum and R. Menon, “OpenMP: An Industry Standard API for Shared-Memory Programming,” IEEE Computational Science and Eng., vol. 5, no. 1, pp. 46-55, Jan.-Mar. 1998.
[20] E. Anderson, Z. Bai, C. Bischof, L.S. Blackford, J. Demmel, J. Dongarra, J.D. Croz, S. Hammarling, A. Greenbaum, A. McKenney, and D. Sorensen, LAPACK Users' Guide, third ed. SIAM, 1999.

Index Terms:
parallel processing,computer graphic equipment,coprocessors,linear algebra,matrix inversion,multiprocessing systems,heterogeneous dual-GPU system,linear algebra algorithm,factorization-based dense matrix inversion algorithm,parallel triangular matrix inversion,heterogeneous multicore CPU system,Graphics processing unit,Instruction sets,Kernel,Random access memory,Indexes,Parallel processing,Table lookup,parallel processing.,Matrix inversion
Citation:
R. Guerrieri, T. De Marco, F. Ries, "Triangular Matrix Inversion on Heterogeneous Multicore Systems," IEEE Transactions on Parallel and Distributed Systems, vol. 23, no. 1, pp. 177-184, Jan. 2012, doi:10.1109/TPDS.2011.103
Usage of this product signifies your acceptance of the Terms of Use.