The Community for Technology Leaders
RSS Icon
Issue No.07 - July (2009 vol.20)
pp: 1038-1049
Julie Wenbin Zhu , Xilinx, Inc., Albuquerque
Patrick G. Bridges , University of New Mexico, Albuquerque
Arthur (Barney) B. Maccabe , University of New Mexico, Albuquerque
Understanding and tuning the performance of large-scale long-running applications is difficult, with both standard trace-based and statistical methods having substantial shortcomings that limit their usefulness. This paper describes a new performance monitoring approach called Embedded Gossip (EG) designed to enable lightweight online performance monitoring and tuning. EG works by piggybacking performance information on existing messages and performing information correlation online, giving each process in a parallel application a weakly consistent global view of the behavior of the entire application. To demonstrate the viability of EG, this paper presents the design and experimental evaluation of two different online monitoring systems and an online global adaptation system driven by Embedded Gossiping. In addition, we present a metric system for evaluating the suitability of an application to EG-based monitoring and adaptation, a general architecture for implementing EG-based monitoring systems, and a modified global commit algorithm appropriate for use in EG-based global adaptation systems. Together, these results demonstrate that EG is an efficient low-overhead approach for addressing a wide range of parallel performance monitoring tasks and that results from these systems can effectively drive online global adaptation.
Lightweight performance monitoring, dynamic performance tuning, support for adaptation, parallel systems.
Julie Wenbin Zhu, Patrick G. Bridges, Arthur (Barney) B. Maccabe, "Lightweight Online Performance Monitoring and Tuning with Embedded Gossip", IEEE Transactions on Parallel & Distributed Systems, vol.20, no. 7, pp. 1038-1049, July 2009, doi:10.1109/TPDS.2008.126
[1] M.K. Aguilera, J.C. Mogul, J.L. Wiener, P. Reynolds, and A. Muthitacharoen, “Performance Debugging for Distributed Systems of Black Boxes,” Proc. 19th ACM Symp. Operating Systems Principles (SOSP), 2003.
[2] J.M. Anderson, L.M. Berc, S.G.J. Dean, M. Henzinger, S.A. Leung, R.L. Sites, M.T. Vandevoorde, C.A. Waldspurger, and W.E. Weihl, “Continuous Profiling: Where Have All the Cycles Gone?” ACM Trans. Computer Systems, vol. 15, no. 4, pp. 357-390, 1997.
[3] T.E. Anderson and E.D. Lazowska, “Quartz: A Tool for Tuning Parallel Program Performance,” Proc. ACM SIGMETRICS '90, pp. 115-125, 1990.
[4] D.A. Bader, Petascale Computing: Algorithms and Applications. Chapman and Hall/CRC Computational, 2007.
[5] D. Bailey, E. Barszcz, J. Barton, D. Browning, R. Carter, L. Dagum, R. Fatoohi, S. Fineberg, P. Frederickson, T. Lasinski, R. Schreiber, H. Simon, V. Venkatakrishnan, and S. Weeratunga, “The NAS Parallel Benchmarks,” Technical Report RNR-94-007, NASA Ames Research Center, 1994.
[6] E. Brewer, “High-Level Optimization via Automated Statistical Modeling,” Proc. Fifth ACM SIGPLAN Symp. Principles and Practice of Parallel Programming (PPoPP '95), pp. 80-91, 1995.
[7] S. Browne, J. Dongarra, N. Garner, G. Ho, and P. Mucci, “A Portable Programming Interface for Performance Evaluation on Modern Processors,” Int'l J. High Performance Computing Applications, vol. 14, no. 3, pp. 189-204, Fall 2000.
[8] J. Caubet, J. Gimenez et al., “A Dynamic Tracing Mechanism for Performance Analysis of OpenMP Applications,” Proc. Int'l Workshop OpenMP Applications and Tools (WOMPAT), 2001.
[9] M.Y. Chen, E. Kiciman, E. Fratkin, A. Fox, and E. Brewer, “Pinpoint: Problem Determination in Large, Dynamic Internet Services,” Proc. Int'l Conf. Dependable Systems and Network (DSN'02), pp. 595-604, June 2002.
[10] K. Devine, E. Boman, R. Heaphy, B. Hendrickson, and C. Vaughan, “Zoltan Data Management Services for Parallel Dynamic Applications,” Computing in Science and Eng., vol. 4, no. 2, 2002.
[11] K. Devine, B. Hendrickson, E. Boman, M.S. John, and C. Vaughan, “Design of Dynamic Load-Balancing Tools for Parallel Applications,” Proc. 14th Int'l Conf. Supercomputing (ICS '00), May 2000.
[12] W.D. Gropp and E. Lusk, User's Guide for mpich, a Portable Implementation of MPI, Math. and Computer Science Division, Argonne Nat'l Laboratory, ANL-96/6, 1996.
[13] M. Gudgin, Essential IDL: Interface Design for COM. Addison-Wesley, 2001.
[14] J. Hollingsworth, “Critical Path Profiling of Message Passing and Shared-Memory Programs,” IEEE Trans. Parallel and Distributed Systems, pp. 1029-1040, 1998.
[15] J. Joyce, G. Lomow, K. Slind, and B. Unger, “Monitoring Distributed Systems,” ACM Trans. Computer Systems, vol. 5, no. 2, pp. 121-150, May 1987.
[16] N. Kappiah, V.W. Freeh, and D.K. Lowenthal, “Just in Time Dynamic Voltage Scaling: Exploiting Inter-Node Slack to Save Energy in MPI Programs,” Proc. ACM/IEEE Conf. Supercomputing (SC '05), p. 35, 2005.
[17] D.J. Kerbyson, H.J. Alme, A. Hoisie, F. Petrini, H.J. Wasserman, and M. Gittings, “Predictive Performance and Scalability Modeling of a Large-Scale Application,” Proc. ACM/IEEE Conf. Supercomputing (SC '01), pp. 37-48, 2001.
[18] O. Kremien, J. Kramer, and J. Magee, “Scalable, Adaptive Load Sharing for Distributed Systems,” IEEE Parallel and Distributed Technology: Systems and Technology, vol. 1, no. 3, pp. 62-70, 1993.
[19] B.W. Lampson and H. Sturgis, “Crash Recovery in a Distributed Data Storage System,” technical report, Computer Science Laboratory, Xerox, Palo Alto Research Center, 1976.
[20] The Los Alamos Message Passing Interface. Los Alamos Nat'l Laboratories.
[21] The ASCI Purple Benchmarks, Los Alamos Nat'l Laboratories, Sandia Nat'l Laboratories, and Lawrence Livermore Nat'l Laboratories, /, 2001.
[22] X. Martorell, N. Smeds, R. Walkup, J.R. Brunheroto, G. Almasi, A. Gunnels, L. DeRose, J. Labarta, F. Escale, H.S.J. Gimenez, and J.E. Moreira, “Blue Gene/L Performance Tools,” IBM J.Research and Development, vol. 49, nos. 2/3, pp. 407-424, 2005.
[23] M.L. Massie, B.N. Chun, and D.E. Culler, “The Ganglia Distributed Monitoring System: Design, Implementation, and Experience,” Parallel Computing, May 2004.
[24] B.P. Miller, M.D. Callaghan, J.M. Cargille, J.K. Hollingsworth, R.B. Irvin, K.L. Karavanic, K. Kunchithapadam, and T. Newhall, “The Paradyn Parallel Performance Measurement Tool,” Computer, vol. 28, no. 11, pp. 37-46, 1995.
[25] S.J. Plimpton and A. Slepoy, “ChemCell: A Particle-Based Model of Protein Chemistry and Diffusion in Microbial Cells,” Technical Report 2003-4509, Sandia Nat'l Laboratories, Albuquerque, NM, 2003.
[26] D. Reed et al., “Scalable Performance Analysis: The Pablo Performance Analysis Environment,” Proc. Scalable Parallel Libraries Conf., pp. 104-113, 1994.
[27] R.L. Ribler, J.S. Vetter, H. Simitci, and D.A. Reed, “Autopilot: Adaptive Control of Distributed Applications,” Proc. Seventh IEEE Symp. High-Performance Distributed Computing (HPDC), 1998.
[28] S. Shende, A. Malony, J. Cuny, K. Lindlan, P. Beckman, and S. Karmesin, “Portable Profiling and Tracing for Parallel Scientific Applications Using C++,” Proc. ACM SIGMETRICS Symp. Parallel and Distributed Tools (SPDT), 1998.
[29] R. Snodgrass and K.P. Shannon, The Interface Description Language: Definition and Use. Computer Science Press, 1989.
[30] M.J. Sottile and R.G. Minnich, “Supermon: A High-Speed Cluster Monitoring System,” Proc. IEEE Int'l Conf. Cluster Computing (CLUSTER '02), Sept. 2002.
[31] The MPI Forum, “MPI: A Message-Passing Interface Standard,” Int'l J. Supercomputer Application, vol. 8, nos. 3/4, pp. 165-416, 1994.
[32] M.M. Theimer and K.A. Lantz, “Finding Idle Machines in a Workstation-Based Distributed System,” IEEE Trans. Software Eng., vol. 15, no. 11, pp. 1444-1458, 1989.
[33] B. Tierney, W.E. Johnston, B. Crowley, G. Hoo, C. Brooks, and D. Gunter, “The NetLogger Methodology for High Performance Distributed Systems Performance Analysis,” Proc. Seventh IEEE Symp. High-Performance Distributed Computing (HPDC '98), pp.260-267, 1998.
[34] Portable Batch System Administrator Guide, Veridian Systems Inc., pbs-user-guide.pdf, 2008.
[35] J.S. Vetter, “Dynamic Statistical Profiling of Communication Activity in Distributed Applications,” Proc. ACM SIGMETRICS, 2002.
[36] C.-Q. Yang and B. Miller, “Critical Path Analysis for the Execution of Parallel and Distributed Programs,” Proc. Eighth Int'l Conf. Distributed Computing Systems (ICDCS), 1988.
[37] M. Zagha, B. Larson, S. Turner, and M. Itzkowitz, “Performance Analysis Using the MIPS R10000 Performance Counters,” Proc. ACM/IEEE Conf. Supercomputing (SC), 1996.
[38] X. Zhang, Z. Wang, N. Gloy, J.B. Chen, and M.D. Smith, “System Support for Automated Profiling and Optimization,” Proc. 16th ACM Symp. Operating System Principles (SOSP), 1997.
[39] W. Zhu, “Lightweight Online Performance Monitoring and Tuning with Embedded Gossip,” PhD dissertation, Computer Science Dept., Univ. of New Mexico, Albuquerque, NM, 2007.
[40] W. Zhu, P.G. Bridges, and A.B. Maccabe, “Online Critical Path Profiling for Parallel Applications,” Proc. IEEE Int'l Conf. Cluster Computing (CLUSTER '05), Sept. 2005.
[41] W. Zhu, P.G. Bridges, and A.B. Maccabe, “Embedded Gossiping: Lightweight Online Measurement for Large-Scale Applications,” Proc. 27th IEEE Int'l Conf. Distributed Computing Systems (ICDCS'07), June 2007.
20 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool