This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
December 2002 (vol. 13 no. 12)
pp. 1320-1332

Abstract—This paper introduces queuing network models for the performance analysis of SPMD applications executed on general-purpose parallel architectures such as MIMD and clusters of workstations. The models are based on the pattern of computation, communication, and I/O operations of typical parallel applications. Analysis of the models leads to the definition of speedup surfaces which capture the relative influence of processors and I/O parallelism and show the effects of different hardware and software components on the performance. Since the parameters of the models correspond to measurable program and hardware characteristics, the models can be used to anticipate the performance behavior of a parallel application as a function of the target architecture (i.e., number of processors, number of disks, I/O topology, etc).

[1] N. Nieuwejaar, D. Kotz, A. Purakayastha, C. Ellis, and M. Best, “File-Access Characteristics of Parallel Scientific Workloads,” Technical Report PCS-TR95-263, Mar. 1995.
[2] S. Baylor and C. Wu, I/O, in Parallel and Distributed Computer Systems. chapter 7, Kluwer Academic, 1996.
[3] E.L. Miller and R.H. Katz, “Input/Output Behavior of Supercomputing Applications,” Proc. Conf. Supercomputing '91, pp. 567-576, Nov. 1991.
[4] B.K. Pasquale and G. Plyzos, “A Static Analysis of I/O Characterization of Scientific Applications in a Production Workload,” Proc. Conf. Supercomputing '93, pp. 388-397, Nov. 1993.
[5] S. Kuo, M. Winslett, Y. Chen, Y. Cho, M. Subramaniam, and K. Seamons, “Application Experience with Parallel Input/Output: Panda and the H3expresso Black Hole Simulation on the SP2,” Proc. Eighth SIAM Conf. Parallel Processing for Scientific Computing, 1997.
[6] J.T. Poole, “Scalable I/O Initiative,” Available at http://www.ccsf.caltech.eduSIO/. 1996.
[7] E. Smirni and D.A. Reed, “Lessons from Characterizing the Input/Output Behavior of Parallel Scientific Applications,” Performance Evaluation, vol. 33, pp. 27-44, 1998.
[8] E. Rosti, G. Serazzi, E. Smirni, and M.S. Squillante, “Models of Parallel Applications with Large Computation and I/O Requirements,” IEEE Trans. Software Eng., vol. 28, no. 3, Mar. 2002.
[9] E. Rosti, G. Serazzi, E. Smirni, and M.S. Squillante, “The Impact of I/O on Program Behavior and Parallel Scheduling,” ACM Sigmetrics Conf., pp. 56-65, June 1998.
[10] C. Gennaro, “Performance Models for I/O Bound SPMD Applications on Clusters of Workstations,” Proc. Seventh Euromicro Workshop Parallel and Distributed Processing, 1999.
[11] G.M. Amdhal, “Validity of the Single Processor Approach to Achieving Large Scale Computing Capabilities,” Proc. AFIPS 1967 Spring Joint Computer Conf., vol. 30, pp. 483-485, Apr. 1967.
[12] J.L. Gustafson, “Reevaluating Amdahl's Law,” Comm. ACM, vol. 31, no. 5, pp. 532-533, 1988.
[13] J.L. Gustafson, “The Scaled-Sized Model: A Revision of Amdhal's Law,” ICS Supercomputing, vol. II, pp. 130-133, 1988.
[14] J.L. Gustafson, G.R. Montry, and R.E. Benner, “Development of Parallel Methods for a 1024-Processor Hypercube,” SIAM J. Scientific and Statisical Computing, vol. 9, no. 4, pp. 609-638 1988
[15] H.P. Flatt and K. Kennedy, “Performance of Parallel Processors,” Parallel Computing, vol. 12, pp. 1-20, 1989.
[16] D.L. Eager, E.D. Lazowska, and J. Zahorjan, “Speedup versus Efficiency in Parallel Systems,” IEEE Trans. Computers, vol. 38, no. 3, pp. 408-423, Mar. 1989.
[17] X. Wu and W. Li, “Performance Models for Scalable Cluster Computing,” J. System Architecture vol. 44, pp. 189-205, 1998.
[18] E.G. Coffman and P.J. Denning, Operating System Theory. Inglewood Cliffs, N.J.: Prentice-Hall, 1973.
[19] U. Herzog and W. Hoffmann, “Syncrhonization Problems in Hierachically Organized Multiprocessor Computer Systems,” Performance of Computer System, Proc. Fourth Int'l Symp. Modeling Performance Evaluation Computer Systems, pp. 29-48, 1979.
[20] K.R. Backer, Introduction to Sequencing and Software. John Wiley & Sons, 1974.
[21] G. Fayolle, P.J.B. King, and I. Mitrani, “On the Execution of Programs by Many Processors,” Proc. Conf. Performance '88, pp. 217-228, 1983.
[22] P. Mussi and J. T. Nain, “Evaluation of Parallel Execution of Program Tree Structures,” Proc. ACM SIGMETRICS Conf. Measurement and Modeling of Computer Systems, pp. 78-87, 1984.
[23] T. Philips, E. Gelenbe, R. Nelson, and A. Tantawi, “The Asymptotic Processing Time for a Model of Parallel Computation,” Proc. Nat'l Computer Conf., 1986.
[24] E. Gelenbe, Multiprocessor Performance. pp. 83-90, John Wiley & Sons, 1989.
[25] J.C.S. Lui, R.R. Muntz, and D. Towsley, “Computing Performance Bounds of Fork-Join Parallel Programs under a Multiprogrammed Environment,” IEEE Trans. Parallel and Distributed Systems, vol. 9, no. 3, pp. 295-311, Mar. 1998.
[26] F. Baccelli and Z. Liu, “On the Execution of Parallel Programs on Multiprocessor SystemA Queuing Theory Approach,” J. ACM vol. 37, no. 2, pp. 373-414, 1990.
[27] S. Balsamo, Z. Liu, and N.M. Van Dijk, “Bound Performance Models of Heterogeneous Parallel Processing Systems,” IEEE Trans. Parallel and Distributed Systems, vol. 9, no. 10, pp. 1041-1056, Oct. 1998.
[28] A.W. Apon and L.W. Dowdy, “The Circulating Processor Model of Parallel Systems,” IEEE Trans. Computers, vol. 46, no. 5, pp. 572-587, May 1997.
[29] X. Qin and J.-L. Baer, “A Performance Evaluation of Cluster Architectures,” Proc. ACM SIGMETRICS '97, 1997.
[30] P.M. Chen, E.K. Lee, G.A. Gibson, R.H. Katz, and D.A. Patterson, “RAID: High-Performance, Reliable Secondary Storage,” ACM Computing Survey, vol. 26, no. 2, pp. 145-185, 1994.
[31] V. Catania, A. Puliafito, S. Riccobene, and L. Vita, “Design and Performance Anlaysis of a Disk Array System,” IEEE Trans. Computers, vol. 44, no. 10, pp. 1236-1247, Oct. 1995.
[32] D. Kotz, “Disk-Directed I/O for MIMD Multiprocessors,” ACM Trans. Computer Systems, vol. 15, no. 1, pp. 41-74, Feb. 1997.
[33] I. Foster, “Design and Building Parallel Programs,” Available at www.mcs.anl.gov/dbpptext/. 1995.
[34] Y. Chen, M. Winslett, K. Seamons, S. Kuo, Y. Cho, and M. Subramaniam, “Scalable Message Passing in Panda,” Proc. Fourth Ann. Workshop I/O Parallel and Distributed Systems, May 1996.
[35] P. Messina, “The Concurrent Supercomputing Consortium: Year One,” IEEE Parallel and Distributed Technology, vol. 1, no. 1, pp. 9-16, 1993.
[36] C.E. Leiserson, “The Network Architecture of the Connection Machine CM-5,” Proc. Fourth Symp. Parallel Algorithms and Architectures, pp. 272-285, June 1992.
[37] “Scalable Powerparallel Systems High-Performance Technical Computing Solutions,” Technical Report GH23-2485-00, IBM, Mar. 1994.
[38] N. Nieuwejaar, D. Kotz, A. Purakayastha, C. Ellis, and M. Best, “File-Access Characteristics of Parallel Scientific Workloads,” IEEE Trans. Parallel and Distributed Systems, vol. 7, no. 10, pp. 1075-1089, Oct. 1996.
[39] K.S. Trivedi, Probability and Statistics with Reliability, Queueing, and Computer Science Applications. Durham, North Carolina: Prentice-Hall, 1982.
[40] B.A. Mahafzah and W.E. Cohen, “Verification on the Burst Send Queuing System Model for Parallel Programs,” Proc. Int'l Conf. Parallel and Distributed Processing Techniques and Applications '99, 1999.
[41] E. Varki, “Mean Value Technique for Closed Fork-Join Networks,” Proc. ACM SIGMETRICS '99, 1999.
[42] P.J. Schweitzer, “Exact Solution of the MVA Equations,” SIAM Rev., vol. 23, pp. 528-532, 1981.
[43] S. Fineberg, P. Wong, B. Nitzberg, and C. Kuszmaul, “PMPIOA Portable Implementation of MPI-IO,” Proc. Sixth Symp. Frontiers of Massively Parallel Computation, pp. 188-195, Oct. 1996.
[44] D. Bailey, T. Harris, W. Saphir, R. van der Wijngaart, A. Woo, and M. Yarrow, “The NAS Parallel Benchmarks 2.0,” Technical Report NAS-95-020, NAS, Available at http://www.nas.nasa.gov/Research/Reports/ Techreports1995/. 1995.
[45] P. Corbett, D. Feitelson, S. Fineberg, Y. Hsu, B. Nitzberg, J. Prost, M. Snir, B. Traversat, and P. Wong, “Overview of the MPI-IO Parallel I/O Interface,” Proc. Third Workshop I/O in Parallel and Distributed Systems (IPPS'95), Apr. 1995.
[46] R. Bagrodia, S. Docy, and A. Kahn, “Parallel Simulation of Parallel File Systems and I/O Programs,” Proc. Conf. Supercomputing '97, pp. 15-21, Nov. 1997.
[47] R. Hockney and M. Berry, “Public International Benchmarks for Parallel Computers: PARKBENCH Committee,” Technical Report Report-1, PARKBENCH Committee, Available at http://www.netlib.orgparkbench/. Feb. 1994.
[48] G.R. Luecke, B. Raffin, and J.J. Coyle, “Comparing the Communication Performance and Scalability of a Linux and a NT Cluster of PCs, a Cray Oorigin 2000, an IBM SP, and a Cray T3E-600,” Proc. First IEEE CS Int'l Workshop Cluster Computing, pp. 26-35, 1999.
[49] E.D. Lazowska, J. Zahorjan, G.S. Graham, and K.C. Sevcik, Quantitative System PerformanceComputer System Analysis Using Queueing Network Models. Englewood Cliffs, N.J.: Prentice-Hall, 1984.
[50] D.A. Reed, R.A. Aydt, R.J. Noe, P.C. Roth, K.A. Shields, B. Schwartz, and L.F. Tavera, “Scalable Performance Analysis: The Pablo Performance Analysis Environment,” Proc. Scalable Parallel Libraries Conf., 1993.

Index Terms:
Single program multiple data (SPMD), multiple instruction multiple data (MIMD), performance model, queuing network model, fork-join queues, mean value analysis (MVA), parallel I/O, synchronization overhead, speedup surface.
Citation:
Paolo Cremonesi, Claudio Gennaro, "Integrated Performance Models for SPMD Applications and MIMD Architectures," IEEE Transactions on Parallel and Distributed Systems, vol. 13, no. 12, pp. 1320-1332, Dec. 2002, doi:10.1109/TPDS.2002.1158268
Usage of this product signifies your acceptance of the Terms of Use.