
This Article  
 
Share  
Bibliographic References  
Add to:  
Digg Furl Spurl Blink Simpy Del.icio.us Y!MyWeb  
Search  
 
ASCII Text  x  
Olivier Beaumont, Vincent Boudet, Antoine Petitet, Fabrice Rastello, Yves Robert, "A Proposal for a Heterogeneous Cluster ScaLAPACK (Dense Linear Solvers)," IEEE Transactions on Computers, vol. 50, no. 10, pp. 10521070, October, 2001.  
BibTex  x  
@article{ 10.1109/12.956091, author = {Olivier Beaumont and Vincent Boudet and Antoine Petitet and Fabrice Rastello and Yves Robert}, title = {A Proposal for a Heterogeneous Cluster ScaLAPACK (Dense Linear Solvers)}, journal ={IEEE Transactions on Computers}, volume = {50}, number = {10}, issn = {00189340}, year = {2001}, pages = {10521070}, doi = {http://doi.ieeecomputersociety.org/10.1109/12.956091}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, }  
RefWorks Procite/RefMan/Endnote  x  
TY  JOUR JO  IEEE Transactions on Computers TI  A Proposal for a Heterogeneous Cluster ScaLAPACK (Dense Linear Solvers) IS  10 SN  00189340 SP1052 EP1070 EPD  10521070 A1  Olivier Beaumont, A1  Vincent Boudet, A1  Antoine Petitet, A1  Fabrice Rastello, A1  Yves Robert, PY  2001 KW  Heterogeneous network KW  heterogeneous grid KW  differentspeed processors KW  loadbalancing KW  data distribution KW  data allocation KW  numerical libraries KW  numerical linear algebra KW  heterogeneous platforms KW  cluster computing. VL  50 JA  IEEE Transactions on Computers ER   
Abstract—In this paper, we study the implementation of dense linear algebra kernels, such as matrix multiplication or linear system solvers, on heterogeneous networks of workstations. The uniform blockcyclic data distribution scheme commonly used for homogeneous collections of processors limits the performance of these linear algebra kernels on heterogeneous grids to the speed of the slowest processor. We present and study more sophisticated data allocation strategies that balance the load on heterogeneous platforms with respect to the performance of the processors. When targeting unidimensional grids, the loadbalancing problem can be solved rather easily. When targeting twodimensional grids, which are the key to scalability and efficiency for numerical kernels, the problem turns out to be surprisingly difficult. We formally state the 2D loadbalancing problem and prove its NPcompleteness. Next, we introduce a data allocation heuristic, which turns out to be very satisfactory: Its practical usefulness is demonstrated by MPI experiments conducted with a heterogeneous network of workstations.
[1] R. Agarwal, F. Gustavson, and M. Zubair, “A High Performance Matrix Multiplication Algorithm on a DistributedMemory Parallel Computer, Using Overlapped Communication,” IBM J. Research and Development, vol. 38, no. 6,pp.673–681, 1994.
[2] S. Anastasiadis and K.C. Sevcik, “Parallel Application Scheduling on Networks of Workstations,” J. Parallel and Distributed Computing, vol. 43, pp. 109124, 1997.
[3] D. Arapov, A. Kalinov, A. Lastovetsky, and I. Ledovskih, “A Parallel Language and Its Programming System for Heterogeneous Networks,” Concurrency: Practice and Experience, vol. 12, no. 13, pp. 13171343, 2000.
[4] G. Ausiello, P. Crescenzi, G. Gambosi, V. Kann, A. MarchettiSpaccamela, and M. Protasi, Complexity and Approximation. Berlin: Springer, 1999.
[5] O. Beaumont, V. Boudet, F. Rastello, and Y. Robert, “Partitioning a Square into Rectangles: NpCompleteness and Approximation Algorithms,” Technical Report RR200010, LIP, ENS Lyon, Feb. 2000.
[6] F. Berman, “HighPerformance Schedulers,” The Grid: Blueprint for a New Computing Infrastructure, I. Foster and C. Kesselman, eds., pp. 279309, MorganKaufmann, 1999.
[7] L. Blackford, J. Choi, A. Cleary, J. Demmel, I. Dhillon, J. Dongarra, S. Hammarling, G. Henry, A. Petitet, K. Stanley, D. Walker, and R.C. Whaley, “Scalapack: A Portable Linear Algebra Library for DistributedMemory Computers—Design Issues and Performance,” Proc. Supercomputing '96, 1996.
[8] L.S. Blackford, J. Choi, A. Cleary, E. D'Azevedo, J. Demmel, I. Dhillon, J. Dongarra, S. Hammarling, G. Henry, A. Petitet, K. Stanley, D. Walker, and R.C. Whaley, ScaLAPACK Users' Guide. SIAM, 1997.
[9] P. Boulet, J. Dongarra, Y. Robert, and F. Vivien, “Static Tiling for Heterogeneous Computing Platforms,” Parallel Computing, vol. 25, pp. 547568, 1999.
[10] J. Choi, J. Demmel, I. Dhillon, J. Dongarra, S. Ostrouchov, A. Petitet, K. Stanley, D. Walker, and R.C. Whaley, “ScaLAPACK: A Portable Linear Algebra Library for Distributed Memory Computers—Design Issues and Performance,” Computer Physics Comm., vol. 97, pp. 115, 1996 (also LAPACK Working Note #95).
[11] J. Choi, J. Dongarra, S. Ostrouchov, A. Petitet, D. Walker, and R.C. Whaley, “The Design and Implementation of the ScaLAPACK LU, QR, and Cholesky Factorization Routines,” Scientific Programming, vol. 5, pp. 173184, 1996.
[12] E. Chu and A. George, “QR Factorization of a Dense Matrix on a Hypercube Multiprocessor,” SIAM J. Scientific and Statistical Computing, vol. 11,pp. 990–1,028, 1990.
[13] M. Cierniak, M.J. Zaki, and W. Li, “Scheduling Algorithms for Heterogeneous Network of Workstations,” The Computer J., vol. 40, no. 6, pp. 356372, 1997.
[14] P.E. Crandall, “The Limited Applicability of Block Decomposition in Cluster Computing,” Proc. Fourth IEEE Int'l Symp. High Performance Distributed Computing, pp. 102109, 1995.
[15] P.E. Crandall and M.J. Quinn, “Block Data Decomposition for DataParallel Programming on a Heterogeneous Workstation Network,” Proc. Second Int'l Symp. High Performance Distributed Computing, pp. 4249, 1993.
[16] P. Crescenzi and V. Kann, “A Compendium of NP Optimization Problems,” http://www.nada.kth.se/~viggowwwcompendium http:/www.compendium.html, 2001.
[17] D.E. Culler and J.P. Singh, Parallel Computer Architecture: A Hardware/Software Approach. San Francisco: Morgan Kaufmann, 1999.
[18] J. Dongarra, R. van de Geijn, and D. Walker, “Scalability Issues in the Design of a Library for Dense Linear Algebra,” J. Parallel and Distributed Computing, vol. 22, no. 3,pp. 523–537, 1994.
[19] J. Dongarra and D. Walker, “Software Libraries for Linear Algebra Computations on High Performance Computers,” SIAM Review, vol. 37, no. 2,pp. 151–180, 1995.
[20] The Grid: Blueprint for a New Computing Infrastructure, I. Foster and C. Kesselman, eds. MorganKaufmann, 1999.
[21] G. Fox, S. Otto, and A. Hey, “Matrix Algorithms on a Hypercube I: Matrix Multiplication,” Parallel Computing, vol. 3, pp. 1731, 1987.
[22] M.R. Garey and D.S. Johnson, Computers and Intractability, a Guide to the Theory of NPCompleteness. W.H. Freeman, 1991.
[23] G.H. Golub and C.F. Van Loan, Matrix Computations, second ed. Johns Hopkins, 1989.
[24] T.F. Gonzalez and S. Zheng, “Approximation Algorithm for Partitioning a Rectangle with Interior Points,” Algorithmica, vol. 5, pp. 1142, 1990.
[25] M. Grigni and F. Manne, “On the Complexity of the Generalized Block Distribution,” Proc. Parallel Algorithms for Irregularly Structured Problems, Third Int'l Workshop IRREGULAR '96, pp. 319326, 1996.
[26] M. Iverson and F. Özgüner, “Dynamic, Competitive Scheduling of Multiple Dags in a Distributed Heterogeneous Environment,” Proc. Seventh Heterogeneous Computing Workshop, 1998.
[27] M. Kaddoura, S. Ranka, and A. Wang, “Array Decompositions for Nonuniform Computational Environments,” J. Parallel and Distributed Computing, vol. 36, no. 2, pp. 91105, 1996.
[28] A. Kalinov and A. Lastovetsky, “Heterogeneous Distribution of Computations while Solving Linear Algebra Problems on Networks of Heterogeneous Computers,” Proc. HPCN Europe 1999, P. Sloot, M. Bubak, A. Hoekstra, and B. Hertzberger, eds., pp. 191200, 1999.
[29] S. Khanna, S. Muthukrishnan, and M. Paterson, “On Approximating Rectangle Tiling and Packing,” Proc. Ninth Ann. ACMSIAM Symp. Discrete Algorithms, pp. 384393, 1998.
[30] V. Kumar, A. Grama, A. Gupta, and G. Karypis, Introduction to Parallel Computing. Benjamin/Cummings, 1994.
[31] A. Lingas, R.Y. Pinter, R.L. Rivest, and A. Shamir, “Minimum Edge Length Partitioning of Rectilinear Polygons,” Proc. 20th Ann. Allerton Conf. Comm., Control, and Computing, 1982.
[32] M. Maheswaran and H.J. Siegel, “A Dynamic Matching and Scheduling Algorithm for Heterogeneous Computing Systems,” Proc. Seventh Heterogeneous Computing Workshop, 1998.
[33] H.J. Siegel, H.G. Dietz, and J.K. Antonio, “Software Support for Heterogeneous Computing,” ACM Computing Surveys, vol. 28, no. 1, pp. 237239, 1996.
[34] G.C. Sih and E.A. Lee, “A CompileTime Scheduling Heuristic for InterconnectionConstrained Heterogeneous Processor Architectures,” IEEE Trans. Parallel and Distributed Systems, vol. 4, no. 2, pp. 175186, Feb. 1993.
[35] M. Tan, H.J. Siegel, J.K. Antonio, and Y.A. Li, “Minimizing the Application Execution Time through Scheduling of Subtasks and Communication Traffic in a Heterogeneous Computing System,” IEEE Trans. Parallel and Distributed Systems, vol. 8, no. 8, pp. 857871, Aug. 1997.
[36] J.B. Weissman and X. Zhao, “Scheduling Parallel Applications in Distributed Networks,” Cluster Computing, vol. 1, no. 1, pp. 109118, 1998.
[37] R.C. Whaley and J. Dongarra, “Automatically Tuned Linear Algebra Software,” Proc. Supercomputing '98, 1998.