This Article 
 Bibliographic References 
 Add to: 
Performance Prediction and Evaluation of Parallel Processing on a NUMA Multiprocessor
October 1991 (vol. 17 no. 10)
pp. 1059-1068

The efficiency of the basic operations of a NUMA (nonuniform memory access) multiprocessor determines the parallel processing performance on a NUMA multiprocessor. The authors present several analytical models for predicting and evaluating the overhead of interprocessor communication, process scheduling, process synchronization, and remote memory access, where network contention and memory contention are considered. Performance measurements to support the models and analyses through several numerical examples have been done on the BBN GP1000, a NUMA shared-memory multiprocessor. Analytical and experimental results give a comprehensive understanding of the various effects, which are important for the effective use of NUMA shared-memory multiprocessor. The results presented can be used to determine optimal strategies in developing an efficient programming environment for a NUMA system.

[1] G. A. Amdahl, "Validity of the single-processor approach to achieving large scale computing capabilities," inAFIPS Conf. Proc., vol. 30, Reston, VA: AFIPS Press, 1967, pp. 483-485.
[2] T. S. Axelrod, "Effects of synchronization barriers on multiprocessor performance,"Parallel Computing, vol. 3, pp. 129-140, 1986.
[3] BBN Advanced Computer Inc.,Butterfly GP1000 Switch Tutorial, 1989.
[4] BBN Advanced Computer Inc.,Inside the GP1000, 1989.
[5] BBN Advanced Computer Inc.,Uniform System Approach, 1989.
[6] L. Bhuyanet al., "Design and performance of generalized interconnection network," inTutorial Advanced Computer Architecture. Los Alamitos, CA: IEEE Computer Soc. Press, 1986, pp. 133-142.
[7] L. Bhuyanet al., "Performance of multiprocessor interconnection network,"IEEE Computer, vol. 22, no. 2, pp. 25-37, 1989.
[8] Y. Birket al., "A simple mechanism for efficient barrier synchronization in MIMD machines," inProc. 1990 Int. Conf. on Parallel Process., 1990, vol. II, pp. 195-198.
[9] R. Bisiani and M. Ravishankar, "Plus: A Distributed Shared-Memory System,"Proc. 17th Int'l Symp. Computer Architecture, IEEE CS Press, Los Alamitos, Calif., Order No. 2047, 1990, pp. 115-124.
[10] F. Bodinet al., "Performance evaluation and prediction for parallel algorithms on the BBN GP1000," inProc. 1990 Int. Conf. on Supercomputing. ACM Press, 1990, pp. 401-413.
[11] D. R. Cheritonet al., "Paradigm: a highly scalable shared-memory multicomputer architecture,"IEEE Computer, vol. 24, no. 2, pp. 33-48, 1991.
[12] J. E. Dennis Jr. and R. B. Schnabel,Numerical Methods for Non-linear Optimization and Nonlinear Equations. Englewood Cliffs, NJ: Prentice-Hall, 1983.
[13] H. P. Flatt and K. Kennedy, "Performance of parallel processors,"Parallel Comput., vol. 12, no. 1, pp. 1-12, 1989.
[14] D. Gajskiet al., "Cedar--a large-scale multiprocessor," inProc. 1983 Int. Conf. on Parallel Process., 1983, pp. 526-529.
[15] E. Gelenbe,Multiprocessor Performance. New York: Wiley, 1989.
[16] C. D. Howe, "An overview of the Butterfly GP1000: a large-scale parallel Unix computer," inProc. 3rd Int. Conf. Supercomput., 1988, vol. 3, pp. 134-141.
[17] A. K. Joneset al., "Software management of Cm*--a distributed multiprocessor," inProc. 1977 Nat. Comput. Conf., 1977, vol. 46, pp. 657-663.
[18] A. H. Karp and H. P. Flatt, "Measuring parallel processor performance,"Commun. ACM, vol. 33, no. 5, pp. 539-543, May 1990.
[19] R. P. LaRowe, Jr. and C. S. Ellis, "Experimental comparison of memory management policies for NUMA multiprocessors," Dept. Comput. Sci., Duke Univ., Tech. Rep. CS-1990-10, 1990.
[20] C. A. Lee, "Barrier synchronization over multistage interconnection networks," Aerospace Corp., El Segundo, CA, Rep. SSD-TR-90-35, 1990.
[21] G. Pfister, "The IBM research parallel processor prototype (RP3): introduction and architecture," inProc. 1985 Int. Conf. on Parallel Process., 1985, vol. 2, pp. 160-169.
[22] R. J. Swan, "Cm*-a modular, multi-microprocessor," inProc. 1977 Nat. Comput. Conf., 1977, vol. 46, pp. 637-644.
[23] R. Thomas, "Behavior of the Butterfly parallel processor in the presence of memory hot spots," inProc. 1986 Int. Conf. on Parallel Process., 1986, pp. 51-58.
[24] W. Wu, "Experimental studies on different programming models on the BBN GP1000," Master thesis, Dept. Comput. Sci., Univ. Texas at San Antonio, 1991.
[25] Z. G. Vranesicet al., "Hector: a hierarchically structured shared memory multiprocessor,"IEEE Computer, vol. 24, no. 1, pp. 72-80, 1991.
[26] P. Yew, "Architecture of the Cedar parallel supercomputer," Ctr. Supercomput. Res. and Develop., Univ. Illinois, Tech. Rep. CSRD 609, 1986.
[27] X. Zhang, "Parallel partition and simulation for large-scale circuits on a local memory multicomputer,"Int. J. Computer-Aided VLSI Design, vol. 2, no. 2, pp. 213-237, 1990.
[28] X. Zhang, "Performance measurement and modeling to evaluate various effects on a shared memory multiprocessor,"IEEE Trans. Soffware Eng., vol. 17, pp. 87-93, Jan. 1991.
[29] X. Zhang, "System effects of interprocessor communication latency in multicomputers,"IEEE Micro, vol. 21, no. 2, pp. 12-19, 1991.
[30] X. Zhang and P. Srinivasan, "Distributed task processing performance on a NUMA shared memory multiprocessor," inProc. 2nd IEEE Symp. Parallel and Distributed Process. Conf.Los Alamitos, CA: IEEE Computer Soc. Press, 1990, pp. 786-789.

Index Terms:
nonuniform memory access; parallel processing performance; analytical models; interprocessor communication; process scheduling; process synchronization; remote memory access; network contention; memory contention; BBN GP1000; NUMA shared-memory multiprocessor; optimal strategies; programming environment; multiprocessing systems; parallel processing; performance evaluation; scheduling
X. Zhang, X. Qin, "Performance Prediction and Evaluation of Parallel Processing on a NUMA Multiprocessor," IEEE Transactions on Software Engineering, vol. 17, no. 10, pp. 1059-1068, Oct. 1991, doi:10.1109/32.99193
Usage of this product signifies your acceptance of the Terms of Use.