This Article 
 Bibliographic References 
 Add to: 
Workload Execution Strategies and Parallel Speedup on Clustered Computers
November 1999 (vol. 48 no. 11)
pp. 1173-1182

Abstract—A model of system performance for parallel processing on clustered multiprocessors is developed which unifies multiprogramming with speedup and scaled-speedup. The model is used to explore processor to process allocation alternatives for executing a workload consisting of multiple processes. Heuristics are developed that relate cluster size to parallel fraction of a program and to process scaling factors. The basic analytical model is made more sophisticated by incorporating considerations that affect the realizable speedup, including explicit process scaling, Degree of Parallelism (DOP) as a discrete function, and communication overhead. New developments incorporate nonuniform workload, interconnection network probability of acceptance of requests, nonuniform memory access, and multithreaded processes.

[1] L.N. Bhuyan, “An Analysis of Processor-Memory Interconnection Networks,” IEEE Trans. Computers, vol. 34, no. 3, Mar. 1985.
[2] D.L. Eager, J. Zahorian, and E.D. Lazowska, "Speedup versus Efficiency in Parallel Systems," IEEE Trans. Computers, vol. 38, no. 3, pp. 408-423, Mar. 1989.
[3] J. L. Gustafson,“Reevaluating Amdahl's law,”Commun. ACM, vol. 31, no. 5, pp. 532–533, 1988.
[4] D. Lenoski, J. Laudon, T. Joe, D. Nakahira, L. Stevens, A. Gupta, and J. Hennessy, "The DASH Prototype: Implementation and Performance," Proc. 19th Ann. Int'l Symp. Computer Architecture, pp. 92-102. IEEE, May 1992.
[5] D.E. Lenoski and W.D. Weber, Scalable Shared-Memory Multiprocessing. San Francisco: Morgan Kaufman, 1995.
[6] K. Hwang, Advanced Computer Architecture: Parallelism, Scalability, Programmability. McGraw-Hill, 1993.
[7] C.E. Leiserson,Z.S. Abuhamdeh,D.C. Douglas,C.R. Feynman,M.N. Ganmuki,J.V. Hill,W.D. Hillis,B.C. Kuszmaul,M.A. St. Pierre,D.S. Wells,M.C. Wong,S.-W. Yang,, and R. Zak,“The network architecture of the connection machine CM-5,” Proc. Fourth Ann. Symp. Parallel Algorithms and Architectures, ACM, pp. 272-285, June 1992.
[8] T. Lovett and T. Shreekant, “The Symmetry Multiprocessor System,” Proc. 1988 Int'l Conf. Parallel Processing, pp. 303-310, 1988.
[9] S.A. Mabbs and K.E. Forward, “Performance Analysis of MR-1, a Clustered Shared-Memory Multiprocessor,” J. Parallel and Distributed Computing, vol. 20, pp. 158-175, 1994.
[10] H. Wang, Y. Jian, and H. Wu, “Performance Analysis of Cluster-Based PPMB Multiprocessor Systems,” The Computer J., vol. 38, no. 5, 1995.
[11] A. H. Karp and H. P. Flatt,“Measuring parallel processor performance,”Commun. ACM, vol. 33, no. 5, pp. 539–543, 1990.
[12] D.A. Wood and M.D. Hill, "Cost-Effective Parallel Computing," Computer, Feb. 1995, pp. 69-72.
[13] E.A. Carmona, “Modeling the Serial and Parallel Fractions of a Parallel Program,” J. Parallel and Distributed Computing, vol. 13, pp. 286-298, 1991.
[14] F.A. Van-Catledge, “Toward a General Model for Evaluating the Relative Performance of Computer Systems,” Int'l J. Supercomputer Applications, vol. 3, no. 2, pp. 100-108, Summer 1989.
[15] X.H. Sun and J.L. Gustafson, “Toward a Better Parallel Performance Metric,” Parallel Computing, vol. 17, pp. 1,093-1,109, 1991.
[16] P. Mohaptra, C.R. Das, and T. Feng, “Performance Analysis of Cluster-Based Multiprocessors,” IEEE Trans. Computers, vol. 43, no. 1, Jan. 1994.

Index Terms:
Parallel processing, speedup, clustering, scaling, efficiency.
Kenneth E. Hoganson, "Workload Execution Strategies and Parallel Speedup on Clustered Computers," IEEE Transactions on Computers, vol. 48, no. 11, pp. 1173-1182, Nov. 1999, doi:10.1109/12.811107
Usage of this product signifies your acceptance of the Terms of Use.