This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Matrix Multiplication on Heterogeneous Platforms
October 2001 (vol. 12 no. 10)
pp. 1033-1051

Abstract—In this paper, we address the issue of implementing matrix multiplication on heterogeneous platforms. We target two different classes of heterogeneous computing resources: heterogeneous networks of workstations and collections of heterogeneous clusters. Intuitively, the problem is to load balance the work with different speed resources while minimizing the communication volume. We formally state this problem in a geometric framework and prove its NP-completeness. Next, we introduce a (polynomial) column-based heuristic, which turns out to be very satisfactory: We derive a theoretical performance guarantee for the heuristic and we assess its practical usefulness through MPI experiments.

[1] R. Agarwal, F. Gustavson, and M. Zubair, “A High Performance Matrix Multiplication Algorithm on a Distributed-Memory Parallel Computer, Using Overlapped Communication,” IBM J. Research and Development, vol. 38, no. 6,pp.673–681, 1994.
[2] H.E. Bal, A. Plaat, T. Kielmann, J. Maassen, R. van Nieuwpoort, and R. Veldema, “Parallel Computing on Wide-Area Clusters: The Albatross Project” Proc. Extreme Linux Workshop, pp. 20-24, 1999.
[3] O. Beaumont, V. Boudet, A. Legrand, F. Rastello, and Y. Robert, “Heterogeneity Considered Harmful to Algorithm Designers,” Technical Report RR-2000-24, LIP, ENS Lyon, June 2000. Also available atwww.ens-lyon.frLIP/.
[4] O. Beaumont, V. Boudet, F. Rastello, and Y. Robert, “Matrix-Matrix Multiplication on Heterogeneous Platforms,” Technical Report RR-2000-02, LIP, ENS Lyon, Jan. 2000.
[5] O. Beaumont, V. Boudet, F. Rastello, and Y. Robert, “Matrix-Matrix Multiplication on Heterogeneous Platforms,” Proc. 2000 Int'l Conf. Parallel Processing (ICPP 2000), 2000.
[6] F. Berman, “High-Performance Schedulers,” The Grid: Blueprint for a New Computing Infrastructure, I. Foster and C. Kesselman, eds., pp. 279-309, Morgan-Kaufmann, 1999.
[7] L. Blackford, J. Choi, A. Cleary, E. D'Azevedo, J. Demmel, I. Dhillon, J. Dongarra, S. Hammarling, G. Henry, A. Petitet, K. Stanley, D. Walker, and R.C. Whaley, ScaLAPACK Users' Guide. Philadelphia, Penn.: SIAM, 1997.
[8] V. Boudet, F. Rastello, and Y. Robert, “Algorithmic Issues for (Distributed) Heterogeneous Computing Platforms,” Proc. Cluster Computing Technologies, Environments, and Applications (CC-TEA '99), R. Buyya and T. Cortes, eds., 1999.
[9] P. Boulet, J. Dongarra, F. Rastello, Y. Robert, and F. Vivien, “Algorithmic Issues on Heterogeneous Computing Platforms,” Proc. Clusters and Computational Grids Workshop, 1998.
[10] P. Boulet, J. Dongarra, F. Rastello, Y. Robert, and F. Vivien, “Algorithmic Issues on Heterogeneous Computing Platforms,” Parallel Processing Letters, vol. 9, no. 2, pp. 197-213, 1999.
[11] H. Casanova and J. Dongarra, “Netsolve: A Network Server for Solving Computational Science Problems,” Int'l J. Supercomputer Applications and High Performance Computing, vol. 11, no. 3, pp. 212-223, 1997.
[12] J. Choi, J. Demmel, I. Dhillon, J. Dongarra, S. Ostrouchov, A. Petitet, K. Stanley, D. Walker, and R.C. Whaley, “ScaLAPACK: A Portable Linear Algebra Library for Distributed Memory Computers—Design Issues and Performance,” Computer Physics Comm., vol. 97. pp. 1-15, 1996.
[13] M. Cierniak, M.J. Zaki, and W. Li, “Customized Dynamic Load Balancing for a Network of Workstations,” J. Parallel and Distributed Computing, vol. 43, pp. 156-162, 1997.
[14] M. Cierniak, M.J. Zaki, and W. Li, “Scheduling Algorithms for Heterogeneous Network of Workstations,” Computer J., vol. 40, no. 6, pp. 356-372, 1997.
[15] P.E. Crandall and M.J. Quinn, “Block Data Decomposition for Data-Parallel Programming on a Heterogeneous Workstation Network,” Proc. Second Int'l Symp. High Performance Distributed Computing, pp. 42-49, 1993.
[16] G. Ausiello, P. Crescenzi, G. Gambosi, V. Kann, A. Marchetti-Spaccamela, and M. Protasi, Complexity and Approximation, Springer, 1999.
[17] J. Dongarra and D. Walker, “Software Libraries for Linear Algebra Computations on High Performance Computers,” SIAM Review, vol. 37, no. 2,pp. 151–180, 1995.
[18] I. Foster and C. Kesselman, “Globus: A Metacomputing Infrastructure Toolkit,” Int'l J. Supercomputer Applications, vol. 11, no. 2, pp. 115-128, 1997.
[19] The Grid: Blueprint for a New Computing Infrastructure. I. Foster and C. Kesselman, eds., Morgan-Kaufmann, 1999.
[20] G. Fox, S. Otto, and A. Hey, “Matrix Algorithms on a Hypercube i: Matrix Multiplication,” Parallel Computing, vol. 3, pp. 17-31, 1987.
[21] M.R. Garey and D.S. Johnson, Computers and Intractability,New York: W.H. Freeman and Co., p. 192, p. 198, 1979, Paperback edition 1991.
[22] T.F. Gonzalez and S. Zheng, “Improved Bounds for Rectangular and Guilhotine Partitions,” J. Symbolic Computation, vol. 7, pp. 591-610, 1989.
[23] T.F. Gonzalez and S. Zheng, “Approximation Algorithm for Partitioning a Rectangle with Interior Points,” Algorithmica, vol. 5, pp. 11-42, 1990.
[24] A.S. Grimshaw and W.A. Wulf, "The Legion Vision of a Worldwide Virtual Computer," Comm. ACM, vol. 40, no. 1, 1997, pp. 39-45.
[25] M. Kaddoura, S. Ranka, and A. Wang, “Array Decompositions for Nonuniform Computational Environments,” J. Parallel and Distributed Computing, vol. 36, no. 2, pp. 91-105, 1996.
[26] A. Kalinov and A. Lastovetsky, “Heterogeneous Distribution of Computations while Solving Linear Algebra Problems on Networks of Heterogeneous Computers,” Proc. HPCN Europe 1999, P. Sloot, M. Bubak, A. Hoekstra, and B. Hertzberger, eds., pp. 191-200, 1999.
[27] R.W. Kenyon, “Tiling a Rectangle with the Fewest Squares,” J. Combination Theory A, vol. 76, pp. 272-291, 1996.
[28] S. Khanna, S. Muthukrishnan, and M. Paterson, “On Approximating Rectangle Tiling and Packing,” Proc. Ninth Ann. ACM-SIAM Symp. Discrete Algorithms, pp. 384-393, 1998.
[29] T.Y. Kong, D.M. Mount, and W. Roscoe, “The Decomposition of a Rectangle into Rectangles of Minimal Perimeter,” SIAM J. Computing, vol. 17, no. 6, pp. 1215-1231, 1988.
[30] T.Y. Kong, D.M. Mount, and M. Wermann, “The Decomposition of a Square into Rectangles of Minimal Perimeter,” Discrete Applied Mathematics, vol. 16, pp. 239-243, 1987.
[31] V. Kumar, A. Grama, A. Gupta, and G. Karypis, Introduction to Parallel Computing: Design and Analysis of Algorithms. Benjamin Cummings, 1994.
[32] A. Legrand, “Algorithmique Parallèle: Environnements Hétérogènes et Non-Dédiés,” Master's thesis,École Normale Supérieure de Lyon, June 2000. Also available atwww.ens-lyon.fr~yrobert.
[33] A. Lingas, R.Y. Pinter, R.L. Rivest, and A. Shamir, “Minimum Edge Length Partitioning of Rectilinear Polygons,” Proc. 20th Ann. Allerton Conf. Comm., Control and Computing, 1982.
[34] C. Polychronopoulos, "Compiler Optimizations for Enhancing Parallelism and Their Impact on the Architecture Design," IEEE Trans. Computers, vol. 37, no. 8, Aug. 1988, pp. 991-1004.
[35] M. Snir, S. Otto, S. Huss-Lederman, D. Walker, and J. Dongarra, “MPI: The Complete Reference,” MIT Press,, 1995.

Index Terms:
Parallel algorithms, load balancing, communication volume, matrix multiplication, numerical linear algebra libraries, heterogeneous platforms, cluster computing, metacomputing.
Citation:
Olivier Beaumont, Vincent Boudet, Fabrice Rastello, Yves Robert, "Matrix Multiplication on Heterogeneous Platforms," IEEE Transactions on Parallel and Distributed Systems, vol. 12, no. 10, pp. 1033-1051, Oct. 2001, doi:10.1109/71.963416
Usage of this product signifies your acceptance of the Terms of Use.