This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
A Proposal for a Heterogeneous Cluster ScaLAPACK (Dense Linear Solvers)
October 2001 (vol. 50 no. 10)
pp. 1052-1070

Abstract—In this paper, we study the implementation of dense linear algebra kernels, such as matrix multiplication or linear system solvers, on heterogeneous networks of workstations. The uniform block-cyclic data distribution scheme commonly used for homogeneous collections of processors limits the performance of these linear algebra kernels on heterogeneous grids to the speed of the slowest processor. We present and study more sophisticated data allocation strategies that balance the load on heterogeneous platforms with respect to the performance of the processors. When targeting unidimensional grids, the load-balancing problem can be solved rather easily. When targeting two-dimensional grids, which are the key to scalability and efficiency for numerical kernels, the problem turns out to be surprisingly difficult. We formally state the 2D load-balancing problem and prove its NP-completeness. Next, we introduce a data allocation heuristic, which turns out to be very satisfactory: Its practical usefulness is demonstrated by MPI experiments conducted with a heterogeneous network of workstations.

[1] R. Agarwal, F. Gustavson, and M. Zubair, “A High Performance Matrix Multiplication Algorithm on a Distributed-Memory Parallel Computer, Using Overlapped Communication,” IBM J. Research and Development, vol. 38, no. 6,pp.673–681, 1994.
[2] S. Anastasiadis and K.C. Sevcik, “Parallel Application Scheduling on Networks of Workstations,” J. Parallel and Distributed Computing, vol. 43, pp. 109-124, 1997.
[3] D. Arapov, A. Kalinov, A. Lastovetsky, and I. Ledovskih, “A Parallel Language and Its Programming System for Heterogeneous Networks,” Concurrency: Practice and Experience, vol. 12, no. 13, pp. 1317-1343, 2000.
[4] G. Ausiello, P. Crescenzi, G. Gambosi, V. Kann, A. Marchetti-Spaccamela, and M. Protasi, Complexity and Approximation. Berlin: Springer, 1999.
[5] O. Beaumont, V. Boudet, F. Rastello, and Y. Robert, “Partitioning a Square into Rectangles: Np-Completeness and Approximation Algorithms,” Technical Report RR-2000-10, LIP, ENS Lyon, Feb. 2000.
[6] F. Berman, “High-Performance Schedulers,” The Grid: Blueprint for a New Computing Infrastructure, I. Foster and C. Kesselman, eds., pp. 279-309, Morgan-Kaufmann, 1999.
[7] L. Blackford, J. Choi, A. Cleary, J. Demmel, I. Dhillon, J. Dongarra, S. Hammarling, G. Henry, A. Petitet, K. Stanley, D. Walker, and R.C. Whaley, “Scalapack: A Portable Linear Algebra Library for Distributed-Memory Computers—Design Issues and Performance,” Proc. Supercomputing '96, 1996.
[8] L.S. Blackford, J. Choi, A. Cleary, E. D'Azevedo, J. Demmel, I. Dhillon, J. Dongarra, S. Hammarling, G. Henry, A. Petitet, K. Stanley, D. Walker, and R.C. Whaley, ScaLAPACK Users' Guide. SIAM, 1997.
[9] P. Boulet, J. Dongarra, Y. Robert, and F. Vivien, “Static Tiling for Heterogeneous Computing Platforms,” Parallel Computing, vol. 25, pp. 547-568, 1999.
[10] J. Choi, J. Demmel, I. Dhillon, J. Dongarra, S. Ostrouchov, A. Petitet, K. Stanley, D. Walker, and R.C. Whaley, “ScaLAPACK: A Portable Linear Algebra Library for Distributed Memory Computers—Design Issues and Performance,” Computer Physics Comm., vol. 97, pp. 1-15, 1996 (also LAPACK Working Note #95).
[11] J. Choi, J. Dongarra, S. Ostrouchov, A. Petitet, D. Walker, and R.C. Whaley, “The Design and Implementation of the ScaLAPACK LU, QR, and Cholesky Factorization Routines,” Scientific Programming, vol. 5, pp. 173-184, 1996.
[12] E. Chu and A. George, “QR Factorization of a Dense Matrix on a Hypercube Multiprocessor,” SIAM J. Scientific and Statistical Computing, vol. 11,pp. 990–1,028, 1990.
[13] M. Cierniak, M.J. Zaki, and W. Li, “Scheduling Algorithms for Heterogeneous Network of Workstations,” The Computer J., vol. 40, no. 6, pp. 356-372, 1997.
[14] P.E. Crandall, “The Limited Applicability of Block Decomposition in Cluster Computing,” Proc. Fourth IEEE Int'l Symp. High Performance Distributed Computing, pp. 102-109, 1995.
[15] P.E. Crandall and M.J. Quinn, “Block Data Decomposition for Data-Parallel Programming on a Heterogeneous Workstation Network,” Proc. Second Int'l Symp. High Performance Distributed Computing, pp. 42-49, 1993.
[16] P. Crescenzi and V. Kann, “A Compendium of NP Optimization Problems,” http://www.nada.kth.se/~viggowwwcompendium http:/www.compendium.html, 2001.
[17] D.E. Culler and J.P. Singh, Parallel Computer Architecture: A Hardware/Software Approach. San Francisco: Morgan Kaufmann, 1999.
[18] J. Dongarra, R. van de Geijn, and D. Walker, “Scalability Issues in the Design of a Library for Dense Linear Algebra,” J. Parallel and Distributed Computing, vol. 22, no. 3,pp. 523–537, 1994.
[19] J. Dongarra and D. Walker, “Software Libraries for Linear Algebra Computations on High Performance Computers,” SIAM Review, vol. 37, no. 2,pp. 151–180, 1995.
[20] The Grid: Blueprint for a New Computing Infrastructure, I. Foster and C. Kesselman, eds. Morgan-Kaufmann, 1999.
[21] G. Fox, S. Otto, and A. Hey, “Matrix Algorithms on a Hypercube I: Matrix Multiplication,” Parallel Computing, vol. 3, pp. 17-31, 1987.
[22] M.R. Garey and D.S. Johnson, Computers and Intractability, a Guide to the Theory of NP-Completeness. W.H. Freeman, 1991.
[23] G.H. Golub and C.F. Van Loan, Matrix Computations, second ed. Johns Hopkins, 1989.
[24] T.F. Gonzalez and S. Zheng, “Approximation Algorithm for Partitioning a Rectangle with Interior Points,” Algorithmica, vol. 5, pp. 11-42, 1990.
[25] M. Grigni and F. Manne, “On the Complexity of the Generalized Block Distribution,” Proc. Parallel Algorithms for Irregularly Structured Problems, Third Int'l Workshop IRREGULAR '96, pp. 319-326, 1996.
[26] M. Iverson and F. Özgüner, “Dynamic, Competitive Scheduling of Multiple Dags in a Distributed Heterogeneous Environment,” Proc. Seventh Heterogeneous Computing Workshop, 1998.
[27] M. Kaddoura, S. Ranka, and A. Wang, “Array Decompositions for Nonuniform Computational Environments,” J. Parallel and Distributed Computing, vol. 36, no. 2, pp. 91-105, 1996.
[28] A. Kalinov and A. Lastovetsky, “Heterogeneous Distribution of Computations while Solving Linear Algebra Problems on Networks of Heterogeneous Computers,” Proc. HPCN Europe 1999, P. Sloot, M. Bubak, A. Hoekstra, and B. Hertzberger, eds., pp. 191-200, 1999.
[29] S. Khanna, S. Muthukrishnan, and M. Paterson, “On Approximating Rectangle Tiling and Packing,” Proc. Ninth Ann. ACM-SIAM Symp. Discrete Algorithms, pp. 384-393, 1998.
[30] V. Kumar, A. Grama, A. Gupta, and G. Karypis, Introduction to Parallel Computing. Benjamin/Cummings, 1994.
[31] A. Lingas, R.Y. Pinter, R.L. Rivest, and A. Shamir, “Minimum Edge Length Partitioning of Rectilinear Polygons,” Proc. 20th Ann. Allerton Conf. Comm., Control, and Computing, 1982.
[32] M. Maheswaran and H.J. Siegel, “A Dynamic Matching and Scheduling Algorithm for Heterogeneous Computing Systems,” Proc. Seventh Heterogeneous Computing Workshop, 1998.
[33] H.J. Siegel, H.G. Dietz, and J.K. Antonio, “Software Support for Heterogeneous Computing,” ACM Computing Surveys, vol. 28, no. 1, pp. 237-239, 1996.
[34] G.C. Sih and E.A. Lee, “A Compile-Time Scheduling Heuristic for Interconnection-Constrained Heterogeneous Processor Architectures,” IEEE Trans. Parallel and Distributed Systems, vol. 4, no. 2, pp. 175-186, Feb. 1993.
[35] M. Tan, H.J. Siegel, J.K. Antonio, and Y.A. Li, “Minimizing the Application Execution Time through Scheduling of Subtasks and Communication Traffic in a Heterogeneous Computing System,” IEEE Trans. Parallel and Distributed Systems, vol. 8, no. 8, pp. 857-871, Aug. 1997.
[36] J.B. Weissman and X. Zhao, “Scheduling Parallel Applications in Distributed Networks,” Cluster Computing, vol. 1, no. 1, pp. 109-118, 1998.
[37] R.C. Whaley and J. Dongarra, “Automatically Tuned Linear Algebra Software,” Proc. Supercomputing '98, 1998.

Index Terms:
Heterogeneous network, heterogeneous grid, different-speed processors, load-balancing, data distribution, data allocation, numerical libraries, numerical linear algebra, heterogeneous platforms, cluster computing.
Citation:
Olivier Beaumont, Vincent Boudet, Antoine Petitet, Fabrice Rastello, Yves Robert, "A Proposal for a Heterogeneous Cluster ScaLAPACK (Dense Linear Solvers)," IEEE Transactions on Computers, vol. 50, no. 10, pp. 1052-1070, Oct. 2001, doi:10.1109/12.956091
Usage of this product signifies your acceptance of the Terms of Use.