
This Article  
 
Share  
Bibliographic References  
Add to:  
Digg Furl Spurl Blink Simpy Del.icio.us Y!MyWeb  
Search  
 
ASCII Text  x  
Paolo Cremonesi, Claudio Gennaro, "Integrated Performance Models for SPMD Applications and MIMD Architectures," IEEE Transactions on Parallel and Distributed Systems, vol. 13, no. 7, pp. 745757, July, 2002.  
BibTex  x  
@article{ 10.1109/TPDS.2002.1019862, author = {Paolo Cremonesi and Claudio Gennaro}, title = {Integrated Performance Models for SPMD Applications and MIMD Architectures}, journal ={IEEE Transactions on Parallel and Distributed Systems}, volume = {13}, number = {7}, issn = {10459219}, year = {2002}, pages = {745757}, doi = {http://doi.ieeecomputersociety.org/10.1109/TPDS.2002.1019862}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, }  
RefWorks Procite/RefMan/Endnote  x  
TY  JOUR JO  IEEE Transactions on Parallel and Distributed Systems TI  Integrated Performance Models for SPMD Applications and MIMD Architectures IS  7 SN  10459219 SP745 EP757 EPD  745757 A1  Paolo Cremonesi, A1  Claudio Gennaro, PY  2002 KW  Single program multiple data (SPMD) KW  multiple instruction multiple data (MIMD) KW  performance model KW  queuing network model KW  forkjoin queues KW  mean value analysis (MVA) KW  parallel I/O KW  synchronization overhead KW  speedup surface. VL  13 JA  IEEE Transactions on Parallel and Distributed Systems ER   
This paper introduces queuing network models for the performance analysis of SPMD applications executed on generalpurpose parallel architectures such as MIMD and clusters of workstations. The models are based on the pattern of computation, communication, and I/O operations of typical parallel applications. Analysis of the models leads to the definition of speedup surfaces which capture the relative influence of processors and I/O parallelism and show the effects of different hardware and software components on the performance. Since the parameters of the models correspond to measurable program and hardware characteristics, the models can be used to anticipate the performance behavior of a parallel application as a function of the target architecture (i.e., number of processors, number of disks, I/O topology, etc).
[1] N. Nieuwejaar, D. Kotz, A. Purakayastha, C. Ellis, and M. Best, “FileAccess Characteristics of Parallel Scientific Workloads,” Technical Report PCSTR95263, Mar. 1995.
[2] S. Baylor and C. Wu, I/O, in Parallel and Distributed Computer Systems. chapter 7, Kluwer Academic, 1996.
[3] E. Miller and R. Katz, "Input/Output Behavior of Supercomputing Applications," Proc. Supercomputing '91, pp. 567576, 1991.
[4] B.K. Pasquale and G.C. Polyzos, “A Static Analysis of I/O Characteristics of Scientific Applications in a Production Workload,” Proc. Supercomputing '93, pp. 388–397, 1993.
[5] S. Kuo, M. Winslett, Y. Chen, Y. Cho, M. Subramaniam, and K. Seamons, “Application Experience with Parallel Input/Output: Panda and the H3expresso Black Hole Simulation on the SP2,” Proc. Eighth SIAM Conf. Parallel Processing for Scientific Computing, 1997.
[6] J.T. Poole, “Scalable I/O Initiative,” Available athttp://www.ccsf.caltech.eduSIO/. 1996.
[7] E. Smirni and D.A. Reed, “Lesson from Characterizing the Input/Output Behavior of Parallel Scientific Applications,” Performance Evaluation, vol. 33, pp. 27–44, 1998.
[8] E. Rosti, G. Serazzi, E. Smirni, and M.S. Squillante, Models of Parallel Applications with Large Computation and I/O Requirements IEEE Trans. Software Eng., vol. 28, no. 3, pp. 286307, Mar. 2002.
[9] E. Rosti, G. Serazzi, E. Smirni, and M.S. Squillante, The Impact of I/O on Program Behavior and Parallel Scheduling Proc. SIGMETRICS Conf. Measurement and Modeling of Computing Systems, pp. 5665, 1998.
[10] C. Gennaro, “Performance Models for I/O Bound SPMD Applications on Clusters of Workstations,” Proc. Seventh Euromicro Workshop Parallel and Distributed Processing, 1999.
[11] G.M. Amdhal, “Validity of the Single Processor Approach to Achieving Large Scale Computing Capabilities,” Proc. AFIPS 1967 Spring Joint Computer Conf., vol. 30, pp. 483485, Apr. 1967.
[12] J. L. Gustafson,“Reevaluating Amdahl's law,”Commun. ACM, vol. 31, no. 5, pp. 532–533, 1988.
[13] J.L. Gustafson, “The ScaledSized Model: A Revision of Amdhal's Law,” ICS Supercomputing, vol. II, pp. 130133, 1988.
[14] J.L. Gustafson, G.R. Montry, and R.E. Benner, “Development of Parallel Methods for a 1024Processor Hypercube,” SIAM J. Scientific and Statisical Computing, vol. 9, no. 4, pp. 609638 1988
[15] H.P. Flatt and K. Kennedy, “Performance of Parallel Processors,” Parallel Computing, vol. 12, pp. 120, 1989.
[16] D.L. Eager, J. Zahorian, and E.D. Lazowska, "Speedup versus Efficiency in Parallel Systems," IEEE Trans. Computers, vol. 38, no. 3, pp. 408423, Mar. 1989.
[17] X. Wu and W. Li, “Performance Models for Scalable Cluster Computing,” J. System Architecture vol. 44, pp. 189205, 1998.
[18] E.G. Coffman and P.J. Denning,Operating Systems Theory, PrenticeHall Inc., Englewood Cliffs, N.J., 1973.
[19] U. Herzog and W. Hoffmann, “Syncrhonization Problems in Hierachically Organized Multiprocessor Computer Systems,” Performance of Computer System, Proc. Fourth Int'l Symp. Modeling Performance Evaluation Computer Systems, pp. 2948, 1979.
[20] K.R. Backer, Introduction to Sequencing and Software. John Wiley&Sons, 1974.
[21] G. Fayolle, P.J.B. King, and I. Mitrani, “On the Execution of Programs by Many Processors,” Proc. Conf. Performance '88, pp. 217228, 1983.
[22] P. Mussi and P. Nain, "Evaluation of Parallel Execution of Program Tree Structures," ACM SIGMETRICS, pp. 7887, 1984.
[23] T. Philips, E. Gelenbe, R. Nelson, and A. Tantawi, “The Asymptotic Processing Time for a Model of Parallel Computation,” Proc. Nat'l Computer Conf., 1986.
[24] E. Gelenbe, Multiprocessor Performance. pp. 8390, John Wiley&Sons, 1989.
[25] J.C.S. Lui, R.R. Muntz, and D. Towsley, “Computing Performance Bounds of ForkJoin Parallel Programs under a Multiprocessing Environment,” IEEE Trans. Parallel and Distributed Systems, vol. 9, no. 3, pp. 295311, Mar. 1998.
[26] F. Baccelli and Z. Liu, “On the Execution of Parallel Programs on Multiprocessor Systems—A Queueing Theory Approach,” J. ACM, vol. 37, no. 2, pp. 373414, Apr. 1990.
[27] S. Balsamo, L. Donatiello, and N.M. Van Dijk, “Bound Performance Models of Heterogeneous Parallel Processing Systems,” IEEE Trans. Parallel and Distributed Systems, vol. 9, no. 10, Oct. 1998.
[28] A.W. Apon and L.W. Dowdy, “The Circulating Processor Model of Parallel Systems,” IEEE Trans. Computers, vol. 46, no. 5, pp. 572587, May 1997.
[29] X. Qin and J.L. Baer, “A Performance Evaluation of Cluster Architectures,” Proc. ACM SIGMETRICS '97, 1997.
[30] P.M. Chen, E.K. Lee, G.A. Gibson, R.H. Katz, and D.A. Patterson, "RAID: HighPerformance Reliable Secondary Storage," ACM Computing Surveys, vol. 36, no. 3, pp. 145185, Aug. 1994.
[31] V. Catania, A. Puliafito, S. Riccobene, and L. Vita, "Design and Performance Analysis of a Disk Array System," IEEE Trans. Computers, vol. 44, no. 10, pp. 1,2361,247, Oct. 1995.
[32] D. Kotz, “DiskDirected I/O for MIMD Multiprocessors,” ACM Trans. Computer Systems, vol. 15, no. 1, pp. 4174, Feb. 1997.
[33] I. Foster, “Design and Building Parallel Programs,” Available atwww.mcs.anl.gov/dbpptext/. 1995.
[34] Y. Chen, M. Winslett, K.E. Seamons, S. Kuo, Y. Cho, and M. Subramaniam, “Scalable Message Passing in Panda,” Proc. Fourth Workshop Input/Output in Parallel and Distributed Systems, pp. 109121, May 1996.
[35] P. Messina, “The Concurrent Supercomputing Consortium: Year 1,” IEEE Parallel&Distributed Technology, Vol. 1 No. 1 Feb. 1993, pp. 9–16.
[36] C.E. Leiserson,Z.S. Abuhamdeh,D.C. Douglas,C.R. Feynman,M.N. Ganmuki,J.V. Hill,W.D. Hillis,B.C. Kuszmaul,M.A. St. Pierre,D.S. Wells,M.C. Wong,S.W. Yang,, and R. Zak,“The network architecture of the connection machine CM5,” Proc. Fourth Ann. Symp. Parallel Algorithms and Architectures, ACM, pp. 272285, June 1992.
[37] “Scalable Powerparallel Systems HighPerformance Technical Computing Solutions,” Technical Report GH23248500, IBM, Mar. 1994.
[38] N. Nieuwejaar, D. Kotz, A. Purakayastha, C.S. Ellis, and M. Best, “FileAccess Characteristics of Parallel Scientific Workloads,” IEEE Trans. Parallel and Distributed Systems, vol. 7, no. 10, pp. 1075–1089, Oct. 1996.
[39] K.S. Trivedi, Probability and Statistics with Reliability, Queuing, and Computer Science Applications. Prentice Hall, 1982.
[40] B.A. Mahafzah and W.E. Cohen, “Verification on the Burst Send Queuing System Model for Parallel Programs,” Proc. The Int'l Conf. Parallel and Distributed Processing Techniques and Applications '99, 1999.
[41] E. Varki, “Mean Value Technique for Closed ForkJoin Networks,” Proc. ACM SIGMETRICS Conf. Measurement and Modeling of Computer Systems, pp. 103112, May 1999.
[42] P.J. Schweitzer, “Exact Solution of the MVA Equations,” SIAM Rev., vol. 23, pp. 528532, 1981.
[43] S. Fineberg, P. Wong, B. Nitzberg, and C. Kuszmaul, “PMPIO—A Portable Implementation of MPIIO,” Proc. Sixth Symp. the Frontiers of Massively Parallel Computation, pp. 188195, Oct. 1996.
[44] D. Bailey, T. Harris, W. Saphir, R. van der Wijngaart, A. Woo, and M. Yarrow, “The NAS Parallel Benchmarks 2.0,” Technical Report NAS95020, NAS, Available athttp://www.nas.nasa.gov/Research/Reports/ Techreports1995/. 1995.
[45] P. Corbett, D. Feitelson, S. Fineberg, Y. Hsu, B. Nitzberg, J. Prost, M. Snir, B. Traversat, and P. Wong, “Overview of the MPIIO Parallel I/O Interface,” Proc. Third Workshop I/O in Parallel and Distributed Systems (IPPS'95), Apr. 1995.
[46] R. Bagrodia, S. Docy, and A. Kahn, “Parallel Simulation of Parallel File Systems and I/O Programs,” Proc. Supercomputing '97, 1997.
[47] R. Hockney and M. Berry, “Public International Benchmarks for Parallel Computers: PARKBENCH Committee,” Technical Report Report1, PARKBENCH Committee, Available athttp://www.netlib.orgparkbench/. Feb. 1994.
[48] G.R. Luecke, B. Raffin, and J.J. Coyle, “Comparing the Communication Performance and Scalability of a Linux and a NT Cluster of PCs, a Cray Oorigin 2000, an IBM SP, and a Cray T3E600,” Proc. First IEEE Computer Soc. Int'l Workshop Cluster Computing, pp. 2635, 1999.
[49] E.D. Lazowska, J. Zahorjan, G.S. Graham, and K.C. Sevcik, Quantitative System Performance, Prentice Hall, pp 6466, 1984.
[50] D.A. Reed et al., "An Overview of the Pablo Performance Analysis Environment," Proc. Scalable Parallel Libraries Conf., IEEE Computer Society Press, Los Alamitos, Calif., Oct. 1994, pp. 104113.