This Article 
 Bibliographic References 
 Add to: 
Integrated Performance Models for SPMD Applications and MIMD Architectures
July 2002 (vol. 13 no. 7)
pp. 745-757

This paper introduces queuing network models for the performance analysis of SPMD applications executed on general-purpose parallel architectures such as MIMD and clusters of workstations. The models are based on the pattern of computation, communication, and I/O operations of typical parallel applications. Analysis of the models leads to the definition of speedup surfaces which capture the relative influence of processors and I/O parallelism and show the effects of different hardware and software components on the performance. Since the parameters of the models correspond to measurable program and hardware characteristics, the models can be used to anticipate the performance behavior of a parallel application as a function of the target architecture (i.e., number of processors, number of disks, I/O topology, etc).

[1] N. Nieuwejaar, D. Kotz, A. Purakayastha, C. Ellis, and M. Best, “File-Access Characteristics of Parallel Scientific Workloads,” Technical Report PCS-TR95-263, Mar. 1995.
[2] S. Baylor and C. Wu, I/O, in Parallel and Distributed Computer Systems. chapter 7, Kluwer Academic, 1996.
[3] E. Miller and R. Katz, "Input/Output Behavior of Supercomputing Applications," Proc. Supercomputing '91, pp. 567-576, 1991.
[4] B.K. Pasquale and G.C. Polyzos, “A Static Analysis of I/O Characteristics of Scientific Applications in a Production Workload,” Proc. Supercomputing '93, pp. 388–397, 1993.
[5] S. Kuo, M. Winslett, Y. Chen, Y. Cho, M. Subramaniam, and K. Seamons, “Application Experience with Parallel Input/Output: Panda and the H3expresso Black Hole Simulation on the SP2,” Proc. Eighth SIAM Conf. Parallel Processing for Scientific Computing, 1997.
[6] J.T. Poole, “Scalable I/O Initiative,” Available athttp://www.ccsf.caltech.eduSIO/. 1996.
[7] E. Smirni and D.A. Reed, “Lesson from Characterizing the Input/Output Behavior of Parallel Scientific Applications,” Performance Evaluation, vol. 33, pp. 27–44, 1998.
[8] E. Rosti, G. Serazzi, E. Smirni, and M.S. Squillante, Models of Parallel Applications with Large Computation and I/O Requirements IEEE Trans. Software Eng., vol. 28, no. 3, pp. 286-307, Mar. 2002.
[9] E. Rosti, G. Serazzi, E. Smirni, and M.S. Squillante, The Impact of I/O on Program Behavior and Parallel Scheduling Proc. SIGMETRICS Conf. Measurement and Modeling of Computing Systems, pp. 56-65, 1998.
[10] C. Gennaro, “Performance Models for I/O Bound SPMD Applications on Clusters of Workstations,” Proc. Seventh Euromicro Workshop Parallel and Distributed Processing, 1999.
[11] G.M. Amdhal, “Validity of the Single Processor Approach to Achieving Large Scale Computing Capabilities,” Proc. AFIPS 1967 Spring Joint Computer Conf., vol. 30, pp. 483-485, Apr. 1967.
[12] J. L. Gustafson,“Reevaluating Amdahl's law,”Commun. ACM, vol. 31, no. 5, pp. 532–533, 1988.
[13] J.L. Gustafson, “The Scaled-Sized Model: A Revision of Amdhal's Law,” ICS Supercomputing, vol. II, pp. 130-133, 1988.
[14] J.L. Gustafson, G.R. Montry, and R.E. Benner, “Development of Parallel Methods for a 1024-Processor Hypercube,” SIAM J. Scientific and Statisical Computing, vol. 9, no. 4, pp. 609-638 1988
[15] H.P. Flatt and K. Kennedy, “Performance of Parallel Processors,” Parallel Computing, vol. 12, pp. 1-20, 1989.
[16] D.L. Eager, J. Zahorian, and E.D. Lazowska, "Speedup versus Efficiency in Parallel Systems," IEEE Trans. Computers, vol. 38, no. 3, pp. 408-423, Mar. 1989.
[17] X. Wu and W. Li, “Performance Models for Scalable Cluster Computing,” J. System Architecture vol. 44, pp. 189-205, 1998.
[18] E.G. Coffman and P.J. Denning,Operating Systems Theory, Prentice-Hall Inc., Englewood Cliffs, N.J., 1973.
[19] U. Herzog and W. Hoffmann, “Syncrhonization Problems in Hierachically Organized Multiprocessor Computer Systems,” Performance of Computer System, Proc. Fourth Int'l Symp. Modeling Performance Evaluation Computer Systems, pp. 29-48, 1979.
[20] K.R. Backer, Introduction to Sequencing and Software. John Wiley&Sons, 1974.
[21] G. Fayolle, P.J.B. King, and I. Mitrani, “On the Execution of Programs by Many Processors,” Proc. Conf. Performance '88, pp. 217-228, 1983.
[22] P. Mussi and P. Nain, "Evaluation of Parallel Execution of Program Tree Structures," ACM SIGMETRICS, pp. 78-87, 1984.
[23] T. Philips, E. Gelenbe, R. Nelson, and A. Tantawi, “The Asymptotic Processing Time for a Model of Parallel Computation,” Proc. Nat'l Computer Conf., 1986.
[24] E. Gelenbe, Multiprocessor Performance. pp. 83-90, John Wiley&Sons, 1989.
[25] J.C.S. Lui, R.R. Muntz, and D. Towsley, “Computing Performance Bounds of Fork-Join Parallel Programs under a Multiprocessing Environment,” IEEE Trans. Parallel and Distributed Systems, vol. 9, no. 3, pp. 295-311, Mar. 1998.
[26] F. Baccelli and Z. Liu, “On the Execution of Parallel Programs on Multiprocessor Systems—A Queueing Theory Approach,” J. ACM, vol. 37, no. 2, pp. 373-414, Apr. 1990.
[27] S. Balsamo, L. Donatiello, and N.M. Van Dijk, “Bound Performance Models of Heterogeneous Parallel Processing Systems,” IEEE Trans. Parallel and Distributed Systems, vol. 9, no. 10, Oct. 1998.
[28] A.W. Apon and L.W. Dowdy, “The Circulating Processor Model of Parallel Systems,” IEEE Trans. Computers, vol. 46, no. 5, pp. 572-587, May 1997.
[29] X. Qin and J.-L. Baer, “A Performance Evaluation of Cluster Architectures,” Proc. ACM SIGMETRICS '97, 1997.
[30] P.M. Chen, E.K. Lee, G.A. Gibson, R.H. Katz, and D.A. Patterson, "RAID: High-Performance Reliable Secondary Storage," ACM Computing Surveys, vol. 36, no. 3, pp. 145-185, Aug. 1994.
[31] V. Catania, A. Puliafito, S. Riccobene, and L. Vita, "Design and Performance Analysis of a Disk Array System," IEEE Trans. Computers, vol. 44, no. 10, pp. 1,236-1,247, Oct. 1995.
[32] D. Kotz, “Disk-Directed I/O for MIMD Multiprocessors,” ACM Trans. Computer Systems, vol. 15, no. 1, pp. 41-74, Feb. 1997.
[33] I. Foster, “Design and Building Parallel Programs,” Available 1995.
[34] Y. Chen, M. Winslett, K.E. Seamons, S. Kuo, Y. Cho, and M. Subramaniam, “Scalable Message Passing in Panda,” Proc. Fourth Workshop Input/Output in Parallel and Distributed Systems, pp. 109-121, May 1996.
[35] P. Messina, “The Concurrent Supercomputing Consortium: Year 1,” IEEE Parallel&Distributed Technology, Vol. 1 No. 1 Feb. 1993, pp. 9–16.
[36] C.E. Leiserson,Z.S. Abuhamdeh,D.C. Douglas,C.R. Feynman,M.N. Ganmuki,J.V. Hill,W.D. Hillis,B.C. Kuszmaul,M.A. St. Pierre,D.S. Wells,M.C. Wong,S.-W. Yang,, and R. Zak,“The network architecture of the connection machine CM-5,” Proc. Fourth Ann. Symp. Parallel Algorithms and Architectures, ACM, pp. 272-285, June 1992.
[37] “Scalable Powerparallel Systems High-Performance Technical Computing Solutions,” Technical Report GH23-2485-00, IBM, Mar. 1994.
[38] N. Nieuwejaar, D. Kotz, A. Purakayastha, C.S. Ellis, and M. Best, “File-Access Characteristics of Parallel Scientific Workloads,” IEEE Trans. Parallel and Distributed Systems, vol. 7, no. 10, pp. 1075–1089, Oct. 1996.
[39] K.S. Trivedi, Probability and Statistics with Reliability, Queuing, and Computer Science Applications. Prentice Hall, 1982.
[40] B.A. Mahafzah and W.E. Cohen, “Verification on the Burst Send Queuing System Model for Parallel Programs,” Proc. The Int'l Conf. Parallel and Distributed Processing Techniques and Applications '99, 1999.
[41] E. Varki, “Mean Value Technique for Closed Fork-Join Networks,” Proc. ACM SIGMETRICS Conf. Measurement and Modeling of Computer Systems, pp. 103-112, May 1999.
[42] P.J. Schweitzer, “Exact Solution of the MVA Equations,” SIAM Rev., vol. 23, pp. 528-532, 1981.
[43] S. Fineberg, P. Wong, B. Nitzberg, and C. Kuszmaul, “PMPIO—A Portable Implementation of MPI-IO,” Proc. Sixth Symp. the Frontiers of Massively Parallel Computation, pp. 188-195, Oct. 1996.
[44] D. Bailey, T. Harris, W. Saphir, R. van der Wijngaart, A. Woo, and M. Yarrow, “The NAS Parallel Benchmarks 2.0,” Technical Report NAS-95-020, NAS, Available at Techreports1995/. 1995.
[45] P. Corbett, D. Feitelson, S. Fineberg, Y. Hsu, B. Nitzberg, J. Prost, M. Snir, B. Traversat, and P. Wong, “Overview of the MPI-IO Parallel I/O Interface,” Proc. Third Workshop I/O in Parallel and Distributed Systems (IPPS'95), Apr. 1995.
[46] R. Bagrodia, S. Docy, and A. Kahn, “Parallel Simulation of Parallel File Systems and I/O Programs,” Proc. Supercomputing '97, 1997.
[47] R. Hockney and M. Berry, “Public International Benchmarks for Parallel Computers: PARKBENCH Committee,” Technical Report Report-1, PARKBENCH Committee, Available athttp://www.netlib.orgparkbench/. Feb. 1994.
[48] G.R. Luecke, B. Raffin, and J.J. Coyle, “Comparing the Communication Performance and Scalability of a Linux and a NT Cluster of PCs, a Cray Oorigin 2000, an IBM SP, and a Cray T3E-600,” Proc. First IEEE Computer Soc. Int'l Workshop Cluster Computing, pp. 26-35, 1999.
[49] E.D. Lazowska, J. Zahorjan, G.S. Graham, and K.C. Sevcik, Quantitative System Performance, Prentice Hall, pp 64-66, 1984.
[50] D.A. Reed et al., "An Overview of the Pablo Performance Analysis Environment," Proc. Scalable Parallel Libraries Conf., IEEE Computer Society Press, Los Alamitos, Calif., Oct. 1994, pp. 104-113.

Index Terms:
Single program multiple data (SPMD), multiple instruction multiple data (MIMD), performance model, queuing network model, fork-join queues, mean value analysis (MVA), parallel I/O, synchronization overhead, speedup surface.
Paolo Cremonesi, Claudio Gennaro, "Integrated Performance Models for SPMD Applications and MIMD Architectures," IEEE Transactions on Parallel and Distributed Systems, vol. 13, no. 7, pp. 745-757, July 2002, doi:10.1109/TPDS.2002.1019862
Usage of this product signifies your acceptance of the Terms of Use.