
This Article  
 
Share  
Bibliographic References  
Add to:  
Digg Furl Spurl Blink Simpy Del.icio.us Y!MyWeb  
Search  
 
ASCII Text  x  
Olivier Beaumont, Vincent Boudet, Fabrice Rastello, Yves Robert, "Matrix Multiplication on Heterogeneous Platforms," IEEE Transactions on Parallel and Distributed Systems, vol. 12, no. 10, pp. 10331051, October, 2001.  
BibTex  x  
@article{ 10.1109/71.963416, author = {Olivier Beaumont and Vincent Boudet and Fabrice Rastello and Yves Robert}, title = {Matrix Multiplication on Heterogeneous Platforms}, journal ={IEEE Transactions on Parallel and Distributed Systems}, volume = {12}, number = {10}, issn = {10459219}, year = {2001}, pages = {10331051}, doi = {http://doi.ieeecomputersociety.org/10.1109/71.963416}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, }  
RefWorks Procite/RefMan/Endnote  x  
TY  JOUR JO  IEEE Transactions on Parallel and Distributed Systems TI  Matrix Multiplication on Heterogeneous Platforms IS  10 SN  10459219 SP1033 EP1051 EPD  10331051 A1  Olivier Beaumont, A1  Vincent Boudet, A1  Fabrice Rastello, A1  Yves Robert, PY  2001 KW  Parallel algorithms KW  load balancing KW  communication volume KW  matrix multiplication KW  numerical linear algebra libraries KW  heterogeneous platforms KW  cluster computing KW  metacomputing. VL  12 JA  IEEE Transactions on Parallel and Distributed Systems ER   
Abstract—In this paper, we address the issue of implementing matrix multiplication on heterogeneous platforms. We target two different classes of heterogeneous computing resources: heterogeneous networks of workstations and collections of heterogeneous clusters. Intuitively, the problem is to load balance the work with different speed resources while minimizing the communication volume. We formally state this problem in a geometric framework and prove its NPcompleteness. Next, we introduce a (polynomial) columnbased heuristic, which turns out to be very satisfactory: We derive a theoretical performance guarantee for the heuristic and we assess its practical usefulness through MPI experiments.
[1] R. Agarwal, F. Gustavson, and M. Zubair, “A High Performance Matrix Multiplication Algorithm on a DistributedMemory Parallel Computer, Using Overlapped Communication,” IBM J. Research and Development, vol. 38, no. 6,pp.673–681, 1994.
[2] H.E. Bal, A. Plaat, T. Kielmann, J. Maassen, R. van Nieuwpoort, and R. Veldema, “Parallel Computing on WideArea Clusters: The Albatross Project” Proc. Extreme Linux Workshop, pp. 2024, 1999.
[3] O. Beaumont, V. Boudet, A. Legrand, F. Rastello, and Y. Robert, “Heterogeneity Considered Harmful to Algorithm Designers,” Technical Report RR200024, LIP, ENS Lyon, June 2000. Also available atwww.enslyon.frLIP/.
[4] O. Beaumont, V. Boudet, F. Rastello, and Y. Robert, “MatrixMatrix Multiplication on Heterogeneous Platforms,” Technical Report RR200002, LIP, ENS Lyon, Jan. 2000.
[5] O. Beaumont, V. Boudet, F. Rastello, and Y. Robert, “MatrixMatrix Multiplication on Heterogeneous Platforms,” Proc. 2000 Int'l Conf. Parallel Processing (ICPP 2000), 2000.
[6] F. Berman, “HighPerformance Schedulers,” The Grid: Blueprint for a New Computing Infrastructure, I. Foster and C. Kesselman, eds., pp. 279309, MorganKaufmann, 1999.
[7] L. Blackford, J. Choi, A. Cleary, E. D'Azevedo, J. Demmel, I. Dhillon, J. Dongarra, S. Hammarling, G. Henry, A. Petitet, K. Stanley, D. Walker, and R.C. Whaley, ScaLAPACK Users' Guide. Philadelphia, Penn.: SIAM, 1997.
[8] V. Boudet, F. Rastello, and Y. Robert, “Algorithmic Issues for (Distributed) Heterogeneous Computing Platforms,” Proc. Cluster Computing Technologies, Environments, and Applications (CCTEA '99), R. Buyya and T. Cortes, eds., 1999.
[9] P. Boulet, J. Dongarra, F. Rastello, Y. Robert, and F. Vivien, “Algorithmic Issues on Heterogeneous Computing Platforms,” Proc. Clusters and Computational Grids Workshop, 1998.
[10] P. Boulet, J. Dongarra, F. Rastello, Y. Robert, and F. Vivien, “Algorithmic Issues on Heterogeneous Computing Platforms,” Parallel Processing Letters, vol. 9, no. 2, pp. 197213, 1999.
[11] H. Casanova and J. Dongarra, “Netsolve: A Network Server for Solving Computational Science Problems,” Int'l J. Supercomputer Applications and High Performance Computing, vol. 11, no. 3, pp. 212223, 1997.
[12] J. Choi, J. Demmel, I. Dhillon, J. Dongarra, S. Ostrouchov, A. Petitet, K. Stanley, D. Walker, and R.C. Whaley, “ScaLAPACK: A Portable Linear Algebra Library for Distributed Memory Computers—Design Issues and Performance,” Computer Physics Comm., vol. 97. pp. 115, 1996.
[13] M. Cierniak, M.J. Zaki, and W. Li, “Customized Dynamic Load Balancing for a Network of Workstations,” J. Parallel and Distributed Computing, vol. 43, pp. 156162, 1997.
[14] M. Cierniak, M.J. Zaki, and W. Li, “Scheduling Algorithms for Heterogeneous Network of Workstations,” Computer J., vol. 40, no. 6, pp. 356372, 1997.
[15] P.E. Crandall and M.J. Quinn, “Block Data Decomposition for DataParallel Programming on a Heterogeneous Workstation Network,” Proc. Second Int'l Symp. High Performance Distributed Computing, pp. 4249, 1993.
[16] G. Ausiello, P. Crescenzi, G. Gambosi, V. Kann, A. MarchettiSpaccamela, and M. Protasi, Complexity and Approximation, Springer, 1999.
[17] J. Dongarra and D. Walker, “Software Libraries for Linear Algebra Computations on High Performance Computers,” SIAM Review, vol. 37, no. 2,pp. 151–180, 1995.
[18] I. Foster and C. Kesselman, “Globus: A Metacomputing Infrastructure Toolkit,” Int'l J. Supercomputer Applications, vol. 11, no. 2, pp. 115128, 1997.
[19] The Grid: Blueprint for a New Computing Infrastructure. I. Foster and C. Kesselman, eds., MorganKaufmann, 1999.
[20] G. Fox, S. Otto, and A. Hey, “Matrix Algorithms on a Hypercube i: Matrix Multiplication,” Parallel Computing, vol. 3, pp. 1731, 1987.
[21] M.R. Garey and D.S. Johnson, Computers and Intractability,New York: W.H. Freeman and Co., p. 192, p. 198, 1979, Paperback edition 1991.
[22] T.F. Gonzalez and S. Zheng, “Improved Bounds for Rectangular and Guilhotine Partitions,” J. Symbolic Computation, vol. 7, pp. 591610, 1989.
[23] T.F. Gonzalez and S. Zheng, “Approximation Algorithm for Partitioning a Rectangle with Interior Points,” Algorithmica, vol. 5, pp. 1142, 1990.
[24] A.S. Grimshaw and W.A. Wulf, "The Legion Vision of a Worldwide Virtual Computer," Comm. ACM, vol. 40, no. 1, 1997, pp. 3945.
[25] M. Kaddoura, S. Ranka, and A. Wang, “Array Decompositions for Nonuniform Computational Environments,” J. Parallel and Distributed Computing, vol. 36, no. 2, pp. 91105, 1996.
[26] A. Kalinov and A. Lastovetsky, “Heterogeneous Distribution of Computations while Solving Linear Algebra Problems on Networks of Heterogeneous Computers,” Proc. HPCN Europe 1999, P. Sloot, M. Bubak, A. Hoekstra, and B. Hertzberger, eds., pp. 191200, 1999.
[27] R.W. Kenyon, “Tiling a Rectangle with the Fewest Squares,” J. Combination Theory A, vol. 76, pp. 272291, 1996.
[28] S. Khanna, S. Muthukrishnan, and M. Paterson, “On Approximating Rectangle Tiling and Packing,” Proc. Ninth Ann. ACMSIAM Symp. Discrete Algorithms, pp. 384393, 1998.
[29] T.Y. Kong, D.M. Mount, and W. Roscoe, “The Decomposition of a Rectangle into Rectangles of Minimal Perimeter,” SIAM J. Computing, vol. 17, no. 6, pp. 12151231, 1988.
[30] T.Y. Kong, D.M. Mount, and M. Wermann, “The Decomposition of a Square into Rectangles of Minimal Perimeter,” Discrete Applied Mathematics, vol. 16, pp. 239243, 1987.
[31] V. Kumar, A. Grama, A. Gupta, and G. Karypis, Introduction to Parallel Computing: Design and Analysis of Algorithms. Benjamin Cummings, 1994.
[32] A. Legrand, “Algorithmique Parallèle: Environnements Hétérogènes et NonDédiés,” Master's thesis,École Normale Supérieure de Lyon, June 2000. Also available atwww.enslyon.fr~yrobert.
[33] A. Lingas, R.Y. Pinter, R.L. Rivest, and A. Shamir, “Minimum Edge Length Partitioning of Rectilinear Polygons,” Proc. 20th Ann. Allerton Conf. Comm., Control and Computing, 1982.
[34] C. Polychronopoulos, "Compiler Optimizations for Enhancing Parallelism and Their Impact on the Architecture Design," IEEE Trans. Computers, vol. 37, no. 8, Aug. 1988, pp. 9911004.
[35] M. Snir, S. Otto, S. HussLederman, D. Walker, and J. Dongarra, “MPI: The Complete Reference,” MIT Press,, 1995.