
This Article  
 
Share  
Bibliographic References  
Add to:  
Digg Furl Spurl Blink Simpy Del.icio.us Y!MyWeb  
Search  
 
ASCII Text  x  
Paolo Cremonesi, Claudio Gennaro, "Integrated Performance Models for SPMD Applications and MIMD Architectures," IEEE Transactions on Parallel and Distributed Systems, vol. 13, no. 12, pp. 13201332, December, 2002.  
BibTex  x  
@article{ 10.1109/TPDS.2002.1158268, author = {Paolo Cremonesi and Claudio Gennaro}, title = {Integrated Performance Models for SPMD Applications and MIMD Architectures}, journal ={IEEE Transactions on Parallel and Distributed Systems}, volume = {13}, number = {12}, issn = {10459219}, year = {2002}, pages = {13201332}, doi = {http://doi.ieeecomputersociety.org/10.1109/TPDS.2002.1158268}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, }  
RefWorks Procite/RefMan/Endnote  x  
TY  JOUR JO  IEEE Transactions on Parallel and Distributed Systems TI  Integrated Performance Models for SPMD Applications and MIMD Architectures IS  12 SN  10459219 SP1320 EP1332 EPD  13201332 A1  Paolo Cremonesi, A1  Claudio Gennaro, PY  2002 KW  Single program multiple data (SPMD) KW  multiple instruction multiple data (MIMD) KW  performance model KW  queuing network model KW  forkjoin queues KW  mean value analysis (MVA) KW  parallel I/O KW  synchronization overhead KW  speedup surface. VL  13 JA  IEEE Transactions on Parallel and Distributed Systems ER   
Abstract—This paper introduces queuing network models for the performance analysis of SPMD applications executed on generalpurpose parallel architectures such as MIMD and clusters of workstations. The models are based on the pattern of computation, communication, and I/O operations of typical parallel applications. Analysis of the models leads to the definition of speedup surfaces which capture the relative influence of processors and I/O parallelism and show the effects of different hardware and software components on the performance. Since the parameters of the models correspond to measurable program and hardware characteristics, the models can be used to anticipate the performance behavior of a parallel application as a function of the target architecture (i.e., number of processors, number of disks, I/O topology, etc).
[1] N. Nieuwejaar, D. Kotz, A. Purakayastha, C. Ellis, and M. Best, “FileAccess Characteristics of Parallel Scientific Workloads,” Technical Report PCSTR95263, Mar. 1995.
[2] S. Baylor and C. Wu, I/O, in Parallel and Distributed Computer Systems. chapter 7, Kluwer Academic, 1996.
[3] E.L. Miller and R.H. Katz, “Input/Output Behavior of Supercomputing Applications,” Proc. Conf. Supercomputing '91, pp. 567576, Nov. 1991.
[4] B.K. Pasquale and G. Plyzos, “A Static Analysis of I/O Characterization of Scientific Applications in a Production Workload,” Proc. Conf. Supercomputing '93, pp. 388397, Nov. 1993.
[5] S. Kuo, M. Winslett, Y. Chen, Y. Cho, M. Subramaniam, and K. Seamons, “Application Experience with Parallel Input/Output: Panda and the H3expresso Black Hole Simulation on the SP2,” Proc. Eighth SIAM Conf. Parallel Processing for Scientific Computing, 1997.
[6] J.T. Poole, “Scalable I/O Initiative,” Available at http://www.ccsf.caltech.eduSIO/. 1996.
[7] E. Smirni and D.A. Reed, “Lessons from Characterizing the Input/Output Behavior of Parallel Scientific Applications,” Performance Evaluation, vol. 33, pp. 2744, 1998.
[8] E. Rosti, G. Serazzi, E. Smirni, and M.S. Squillante, “Models of Parallel Applications with Large Computation and I/O Requirements,” IEEE Trans. Software Eng., vol. 28, no. 3, Mar. 2002.
[9] E. Rosti, G. Serazzi, E. Smirni, and M.S. Squillante, “The Impact of I/O on Program Behavior and Parallel Scheduling,” ACM Sigmetrics Conf., pp. 5665, June 1998.
[10] C. Gennaro, “Performance Models for I/O Bound SPMD Applications on Clusters of Workstations,” Proc. Seventh Euromicro Workshop Parallel and Distributed Processing, 1999.
[11] G.M. Amdhal, “Validity of the Single Processor Approach to Achieving Large Scale Computing Capabilities,” Proc. AFIPS 1967 Spring Joint Computer Conf., vol. 30, pp. 483485, Apr. 1967.
[12] J.L. Gustafson, “Reevaluating Amdahl's Law,” Comm. ACM, vol. 31, no. 5, pp. 532533, 1988.
[13] J.L. Gustafson, “The ScaledSized Model: A Revision of Amdhal's Law,” ICS Supercomputing, vol. II, pp. 130133, 1988.
[14] J.L. Gustafson, G.R. Montry, and R.E. Benner, “Development of Parallel Methods for a 1024Processor Hypercube,” SIAM J. Scientific and Statisical Computing, vol. 9, no. 4, pp. 609638 1988
[15] H.P. Flatt and K. Kennedy, “Performance of Parallel Processors,” Parallel Computing, vol. 12, pp. 120, 1989.
[16] D.L. Eager, E.D. Lazowska, and J. Zahorjan, “Speedup versus Efficiency in Parallel Systems,” IEEE Trans. Computers, vol. 38, no. 3, pp. 408423, Mar. 1989.
[17] X. Wu and W. Li, “Performance Models for Scalable Cluster Computing,” J. System Architecture vol. 44, pp. 189205, 1998.
[18] E.G. Coffman and P.J. Denning, Operating System Theory. Inglewood Cliffs, N.J.: PrenticeHall, 1973.
[19] U. Herzog and W. Hoffmann, “Syncrhonization Problems in Hierachically Organized Multiprocessor Computer Systems,” Performance of Computer System, Proc. Fourth Int'l Symp. Modeling Performance Evaluation Computer Systems, pp. 2948, 1979.
[20] K.R. Backer, Introduction to Sequencing and Software. John Wiley & Sons, 1974.
[21] G. Fayolle, P.J.B. King, and I. Mitrani, “On the Execution of Programs by Many Processors,” Proc. Conf. Performance '88, pp. 217228, 1983.
[22] P. Mussi and J. T. Nain, “Evaluation of Parallel Execution of Program Tree Structures,” Proc. ACM SIGMETRICS Conf. Measurement and Modeling of Computer Systems, pp. 7887, 1984.
[23] T. Philips, E. Gelenbe, R. Nelson, and A. Tantawi, “The Asymptotic Processing Time for a Model of Parallel Computation,” Proc. Nat'l Computer Conf., 1986.
[24] E. Gelenbe, Multiprocessor Performance. pp. 8390, John Wiley & Sons, 1989.
[25] J.C.S. Lui, R.R. Muntz, and D. Towsley, “Computing Performance Bounds of ForkJoin Parallel Programs under a Multiprogrammed Environment,” IEEE Trans. Parallel and Distributed Systems, vol. 9, no. 3, pp. 295311, Mar. 1998.
[26] F. Baccelli and Z. Liu, “On the Execution of Parallel Programs on Multiprocessor SystemA Queuing Theory Approach,” J. ACM vol. 37, no. 2, pp. 373414, 1990.
[27] S. Balsamo, Z. Liu, and N.M. Van Dijk, “Bound Performance Models of Heterogeneous Parallel Processing Systems,” IEEE Trans. Parallel and Distributed Systems, vol. 9, no. 10, pp. 10411056, Oct. 1998.
[28] A.W. Apon and L.W. Dowdy, “The Circulating Processor Model of Parallel Systems,” IEEE Trans. Computers, vol. 46, no. 5, pp. 572587, May 1997.
[29] X. Qin and J.L. Baer, “A Performance Evaluation of Cluster Architectures,” Proc. ACM SIGMETRICS '97, 1997.
[30] P.M. Chen, E.K. Lee, G.A. Gibson, R.H. Katz, and D.A. Patterson, “RAID: HighPerformance, Reliable Secondary Storage,” ACM Computing Survey, vol. 26, no. 2, pp. 145185, 1994.
[31] V. Catania, A. Puliafito, S. Riccobene, and L. Vita, “Design and Performance Anlaysis of a Disk Array System,” IEEE Trans. Computers, vol. 44, no. 10, pp. 12361247, Oct. 1995.
[32] D. Kotz, “DiskDirected I/O for MIMD Multiprocessors,” ACM Trans. Computer Systems, vol. 15, no. 1, pp. 4174, Feb. 1997.
[33] I. Foster, “Design and Building Parallel Programs,” Available at www.mcs.anl.gov/dbpptext/. 1995.
[34] Y. Chen, M. Winslett, K. Seamons, S. Kuo, Y. Cho, and M. Subramaniam, “Scalable Message Passing in Panda,” Proc. Fourth Ann. Workshop I/O Parallel and Distributed Systems, May 1996.
[35] P. Messina, “The Concurrent Supercomputing Consortium: Year One,” IEEE Parallel and Distributed Technology, vol. 1, no. 1, pp. 916, 1993.
[36] C.E. Leiserson, “The Network Architecture of the Connection Machine CM5,” Proc. Fourth Symp. Parallel Algorithms and Architectures, pp. 272285, June 1992.
[37] “Scalable Powerparallel Systems HighPerformance Technical Computing Solutions,” Technical Report GH23248500, IBM, Mar. 1994.
[38] N. Nieuwejaar, D. Kotz, A. Purakayastha, C. Ellis, and M. Best, “FileAccess Characteristics of Parallel Scientific Workloads,” IEEE Trans. Parallel and Distributed Systems, vol. 7, no. 10, pp. 10751089, Oct. 1996.
[39] K.S. Trivedi, Probability and Statistics with Reliability, Queueing, and Computer Science Applications. Durham, North Carolina: PrenticeHall, 1982.
[40] B.A. Mahafzah and W.E. Cohen, “Verification on the Burst Send Queuing System Model for Parallel Programs,” Proc. Int'l Conf. Parallel and Distributed Processing Techniques and Applications '99, 1999.
[41] E. Varki, “Mean Value Technique for Closed ForkJoin Networks,” Proc. ACM SIGMETRICS '99, 1999.
[42] P.J. Schweitzer, “Exact Solution of the MVA Equations,” SIAM Rev., vol. 23, pp. 528532, 1981.
[43] S. Fineberg, P. Wong, B. Nitzberg, and C. Kuszmaul, “PMPIOA Portable Implementation of MPIIO,” Proc. Sixth Symp. Frontiers of Massively Parallel Computation, pp. 188195, Oct. 1996.
[44] D. Bailey, T. Harris, W. Saphir, R. van der Wijngaart, A. Woo, and M. Yarrow, “The NAS Parallel Benchmarks 2.0,” Technical Report NAS95020, NAS, Available at http://www.nas.nasa.gov/Research/Reports/ Techreports1995/. 1995.
[45] P. Corbett, D. Feitelson, S. Fineberg, Y. Hsu, B. Nitzberg, J. Prost, M. Snir, B. Traversat, and P. Wong, “Overview of the MPIIO Parallel I/O Interface,” Proc. Third Workshop I/O in Parallel and Distributed Systems (IPPS'95), Apr. 1995.
[46] R. Bagrodia, S. Docy, and A. Kahn, “Parallel Simulation of Parallel File Systems and I/O Programs,” Proc. Conf. Supercomputing '97, pp. 1521, Nov. 1997.
[47] R. Hockney and M. Berry, “Public International Benchmarks for Parallel Computers: PARKBENCH Committee,” Technical Report Report1, PARKBENCH Committee, Available at http://www.netlib.orgparkbench/. Feb. 1994.
[48] G.R. Luecke, B. Raffin, and J.J. Coyle, “Comparing the Communication Performance and Scalability of a Linux and a NT Cluster of PCs, a Cray Oorigin 2000, an IBM SP, and a Cray T3E600,” Proc. First IEEE CS Int'l Workshop Cluster Computing, pp. 2635, 1999.
[49] E.D. Lazowska, J. Zahorjan, G.S. Graham, and K.C. Sevcik, Quantitative System PerformanceComputer System Analysis Using Queueing Network Models. Englewood Cliffs, N.J.: PrenticeHall, 1984.
[50] D.A. Reed, R.A. Aydt, R.J. Noe, P.C. Roth, K.A. Shields, B. Schwartz, and L.F. Tavera, “Scalable Performance Analysis: The Pablo Performance Analysis Environment,” Proc. Scalable Parallel Libraries Conf., 1993.