This Article 
 Bibliographic References 
 Add to: 
Matrix Partitioning on a Virtual Shared Memory Parallel Machine
April 1996 (vol. 7 no. 4)
pp. 343-355

Abstract—The general problem considered in the paper is partitioning of a matrix operation between processors of a parallel system in an optimum load-balanced way without potential memory contention. The considered parallel system is defined by several features the main of which is availability of a virtual shared memory divided into segments. If partitioning of a matrix operation causes parallel access to the same memory segment with writing data to the segment by at least one processor, then contention between processors arises which implies performance degradation. To eliminate such situation, a restriction is imposed on a class of possible partitionings, so that no two processors would write data to the same segment. On the resulting class of contention-free partitionings, a load-balanced optimum partitioning is defined as satisfying independent minimax criteria. The main result of the paper is an algorithm for finding the optimum partitioning by means of analytical solution of respective minimax problems. The paper also discusses implementation and performance issues related to the algorithm, on the basis of experience at Kendall Square Research Corporation, where the partitioning algorithm was used for creating high-performance parallel matrix libraries.

[1] R.W. Numrich, "Memory Contention for Shared Memory Vector Multiprocessors," Proc. Supercomputing '92, pp. 316-324. IEEE CS Press, 1992.
[2] D.H. Bailey,“Vector computer memory bank contention,” IEEE Trans. Computers, vol. 36, pp. 293-298, 1987.
[3] I.Y. Bucher and Simmons, "Measurement of Memory Access Contentions in Multiple Vector Processors," Proc. Supercomputing '91, pp. 806-817, 1991.
[4] C.H. Hoogendoorn, "A General Model for Memory Interference in Multiprocessors," IEEE Trans. Computers, vol. 26, pp. 998-1,005, 1977.
[5] P. Tang and R.H. Mendez, "Memory Conflicts and Machine Performance," Proc. Supercomputing '89, pp. 826-831, 1989.
[6] K. Li, "IVY: A Shared Virtual Memory System for Parallel Computing," Proc. Parallel Processing, Int'l Conf., vol. II, pp. 94-101, 1988.
[7] K. Li, "Shared Virtual Memory on Loosely-Coupled Multiprocessors," Tech Report VALEU-RR-492, Yale Univ., 1986,.
[8] K. Li and P. Hudak, "Memory Coherence in Shared Virtual Memory Systems," Proc. Fifth Ann. ACM Symp. Principles of Distributed Computing, pp. 229-239, 1986.
[9] L.M. Censier and P. Featrier, "A New Solution to Coherence Problems in Malticache Systems," IEEE Trans. Computers, vol. 27, no. 12, pp. 1,112-1,118, 1978.
[10] A.J. Smith, "Cache Memories," ACM Computing Surveys, Vol. 14, 1982, pp. 473-540.
[11] B.M. Lampson and D.D. Redell, "Experience with Processes and Monitors in Mesa," Comm. ACM, vol. 27, no. 6, pp. 594-602, 1984.
[12] P.J. Leach, P.H. Levine, B.P. Douros, J.A. Hamilton, D.L. Nelson, and B.L. Stumpf, "The Architecture of an Integrated Local Network," IEEE J. Selected Areas in Comm., 1983.
[13] J. Archibald and J.L. Baer, "Cache Coherence Protocols: Evaluation Using a Multiprocessor Simulation Model," ACM Trans. Computer Systems, vol. 4, no. 4, Nov. 1986.
[14] J.J. Dongarra et al., LINPACK : Users' Guide.Philadelphia: SIAM, 1979.
[15] E. Anderson, Z. Bai, C. Bischof, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, S. Ostrouchov, and D. Sorensen, LAPACK Users' Guide. Philadelphia, Penn.: SIAM, 1995.
[16] S. Frank, H. Burkhardt, and J. Rothnie, "The KSR1: Bridging the Gap between Shared Memory and MPPs," Compcon '93 Proc., pp. 285-294, 1993.
[17] E. Burke, "An Overview of System Software for the KSR1," Compcon '93 Proc., pp. 295-299, 1993.
[18] S. Breit, C. Pangali, and D. Zirl, "Technical Applications on the KSR1: High Performance and Ease of Use," Compcon '93 Proc., pp. 303-311, 1993.
[19] T. Shavit, L. Lee, and S. Breit, "A Practical Parallel Runtime Environment on a Multiprocessor with Global Address Space," Proc. Fifth ECMW Workshop Use of Parallel Processors in Meteorology, pp. 1-20, 1992.
[20] E.L. Boyd, J-D. Wellman, S. G. Abraham, and E.S. Davidson, "Evaluating the Communication Performance of MPPs Using Iterative Sparse Matrix Multiplications," Advanced Computer Architecture Laboratory, Dept. of Electrical Eng. and Computer Science, Univ. of Michigan, Ann Arbor, pp. 1-22, 1993.
[21] U. Ramachandran, G. Shah, S. Ravikumar, and J. Muthukumarasamy, "Scalability Study of the KSR1," College of Computing, Georgia Inst. of Technology, Atlanta, 1993.
[22] J.J. Modi, Parallel Algorithms and Matrix Computation.New York: Oxford Univ. Press, 1988.
[23] K.A. Gallivan, R.J., Plemmons, and A.H. Sameh, "Parallel Algorithms for Dense Linear Algebra Computations," Parallel Algorithms for Matrix Computations, K.A. Gallivan et al. SIAM, 1989.
[24] G.H. Golub and C.F. Van Loan, Matrix Computations. Johns Hopkins Univ. Press, 1989.

Index Terms:
Basic matrix operations, matrix partitioning, minimax criteria, numerical algorithm, optimum load balance, parallel multiprocessor, performance, virtual shared memory.
Benjamin Charny, "Matrix Partitioning on a Virtual Shared Memory Parallel Machine," IEEE Transactions on Parallel and Distributed Systems, vol. 7, no. 4, pp. 343-355, April 1996, doi:10.1109/71.494629
Usage of this product signifies your acceptance of the Terms of Use.