
This Article  
 
Share  
Bibliographic References  
Add to:  
Digg Furl Spurl Blink Simpy Del.icio.us Y!MyWeb  
Search  
 
ASCII Text  x  
Benjamin Charny, "Matrix Partitioning on a Virtual Shared Memory Parallel Machine," IEEE Transactions on Parallel and Distributed Systems, vol. 7, no. 4, pp. 343355, April, 1996.  
BibTex  x  
@article{ 10.1109/71.494629, author = {Benjamin Charny}, title = {Matrix Partitioning on a Virtual Shared Memory Parallel Machine}, journal ={IEEE Transactions on Parallel and Distributed Systems}, volume = {7}, number = {4}, issn = {10459219}, year = {1996}, pages = {343355}, doi = {http://doi.ieeecomputersociety.org/10.1109/71.494629}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, }  
RefWorks Procite/RefMan/Endnote  x  
TY  JOUR JO  IEEE Transactions on Parallel and Distributed Systems TI  Matrix Partitioning on a Virtual Shared Memory Parallel Machine IS  4 SN  10459219 SP343 EP355 EPD  343355 A1  Benjamin Charny, PY  1996 KW  Basic matrix operations KW  matrix partitioning KW  minimax criteria KW  numerical algorithm KW  optimum load balance KW  parallel multiprocessor KW  performance KW  virtual shared memory. VL  7 JA  IEEE Transactions on Parallel and Distributed Systems ER   
Abstract—The general problem considered in the paper is partitioning of a matrix operation between processors of a parallel system in an optimum loadbalanced way without potential memory contention. The considered parallel system is defined by several features the main of which is availability of a virtual shared memory divided into segments. If partitioning of a matrix operation causes parallel access to the same memory segment with writing data to the segment by at least one processor, then contention between processors arises which implies performance degradation. To eliminate such situation, a restriction is imposed on a class of possible partitionings, so that no two processors would write data to the same segment. On the resulting class of contentionfree partitionings, a loadbalanced optimum partitioning is defined as satisfying independent minimax criteria. The main result of the paper is an algorithm for finding the optimum partitioning by means of analytical solution of respective minimax problems. The paper also discusses implementation and performance issues related to the algorithm, on the basis of experience at Kendall Square Research Corporation, where the partitioning algorithm was used for creating highperformance parallel matrix libraries.
[1] R.W. Numrich, "Memory Contention for Shared Memory Vector Multiprocessors," Proc. Supercomputing '92, pp. 316324. IEEE CS Press, 1992.
[2] D.H. Bailey,“Vector computer memory bank contention,” IEEE Trans. Computers, vol. 36, pp. 293298, 1987.
[3] I.Y. Bucher and Simmons, "Measurement of Memory Access Contentions in Multiple Vector Processors," Proc. Supercomputing '91, pp. 806817, 1991.
[4] C.H. Hoogendoorn, "A General Model for Memory Interference in Multiprocessors," IEEE Trans. Computers, vol. 26, pp. 9981,005, 1977.
[5] P. Tang and R.H. Mendez, "Memory Conflicts and Machine Performance," Proc. Supercomputing '89, pp. 826831, 1989.
[6] K. Li, "IVY: A Shared Virtual Memory System for Parallel Computing," Proc. Parallel Processing, Int'l Conf., vol. II, pp. 94101, 1988.
[7] K. Li, "Shared Virtual Memory on LooselyCoupled Multiprocessors," Tech Report VALEURR492, Yale Univ., 1986,.
[8] K. Li and P. Hudak, "Memory Coherence in Shared Virtual Memory Systems," Proc. Fifth Ann. ACM Symp. Principles of Distributed Computing, pp. 229239, 1986.
[9] L.M. Censier and P. Featrier, "A New Solution to Coherence Problems in Malticache Systems," IEEE Trans. Computers, vol. 27, no. 12, pp. 1,1121,118, 1978.
[10] A.J. Smith, "Cache Memories," ACM Computing Surveys, Vol. 14, 1982, pp. 473540.
[11] B.M. Lampson and D.D. Redell, "Experience with Processes and Monitors in Mesa," Comm. ACM, vol. 27, no. 6, pp. 594602, 1984.
[12] P.J. Leach, P.H. Levine, B.P. Douros, J.A. Hamilton, D.L. Nelson, and B.L. Stumpf, "The Architecture of an Integrated Local Network," IEEE J. Selected Areas in Comm., 1983.
[13] J. Archibald and J.L. Baer, "Cache Coherence Protocols: Evaluation Using a Multiprocessor Simulation Model," ACM Trans. Computer Systems, vol. 4, no. 4, Nov. 1986.
[14] J.J. Dongarra et al., LINPACK : Users' Guide.Philadelphia: SIAM, 1979.
[15] E. Anderson, Z. Bai, C. Bischof, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, S. Ostrouchov, and D. Sorensen, LAPACK Users' Guide. Philadelphia, Penn.: SIAM, 1995.
[16] S. Frank, H. Burkhardt, and J. Rothnie, "The KSR1: Bridging the Gap between Shared Memory and MPPs," Compcon '93 Proc., pp. 285294, 1993.
[17] E. Burke, "An Overview of System Software for the KSR1," Compcon '93 Proc., pp. 295299, 1993.
[18] S. Breit, C. Pangali, and D. Zirl, "Technical Applications on the KSR1: High Performance and Ease of Use," Compcon '93 Proc., pp. 303311, 1993.
[19] T. Shavit, L. Lee, and S. Breit, "A Practical Parallel Runtime Environment on a Multiprocessor with Global Address Space," Proc. Fifth ECMW Workshop Use of Parallel Processors in Meteorology, pp. 120, 1992.
[20] E.L. Boyd, JD. Wellman, S. G. Abraham, and E.S. Davidson, "Evaluating the Communication Performance of MPPs Using Iterative Sparse Matrix Multiplications," Advanced Computer Architecture Laboratory, Dept. of Electrical Eng. and Computer Science, Univ. of Michigan, Ann Arbor, pp. 122, 1993.
[21] U. Ramachandran, G. Shah, S. Ravikumar, and J. Muthukumarasamy, "Scalability Study of the KSR1," College of Computing, Georgia Inst. of Technology, Atlanta, 1993.
[22] J.J. Modi, Parallel Algorithms and Matrix Computation.New York: Oxford Univ. Press, 1988.
[23] K.A. Gallivan, R.J., Plemmons, and A.H. Sameh, "Parallel Algorithms for Dense Linear Algebra Computations," Parallel Algorithms for Matrix Computations, K.A. Gallivan et al. SIAM, 1989.
[24] G.H. Golub and C.F. Van Loan, Matrix Computations. Johns Hopkins Univ. Press, 1989.