This Article 
 Bibliographic References 
 Add to: 
Distributed Performance Monitoring: Methods, Tools, and Applications
June 1994 (vol. 5 no. 6)
pp. 585-598

A method for analyzing the functional behavior and the performance of programs in distributed systems is presented. We use hybrid monitoring, a technique which combines advantages of both software monitoring and hardware monitoring. The paper contains a description of a hardware monitor and a software package (ZM4/SIMPLE) which make our concepts available to programmers, assisting them in debugging and tuning of their code. A short survey of related monitor systems highlights the distinguishing features of our implementation. As an application of our monitoring and evaluation system, the analysis of a parallel ray tracing program running on the SUPRENUM multiprocessor is described. It is shown that monitoring and modeling both rely on a common abstraction of a system's dynamic behavior and therefore can be integrated to one comprehensive methodology. This methodology is supported by a set of tools.

[1] M. Ajmone Marsan, G. Balbo, and G. Conte, "A class of generalized stochastic Petri nets for the performance evaluation of multiprocessor systems,"ACM Trans. Comput. Syst., vol. 2, pp. 93-122, May 1984.
[2] T. E. Anderson and E. D. Lazowska, "Quartz: A tool for tuning parallel program performance," Tech. Rep. TR # 89-10-05, Dept. of Comput. Sci., Univ. of Washington, Seattle, WA, Sept. 1989.
[3] P.C. Bates, "Debugging heterogeneous distributed systems using event-based models of behavior," inProc. ACM SIGPLAN and SIGOPS Workshop on Parallel and Distributed Debugging, pp. 11-22, May 1988.
[4] A. Böhm, J. Brehm and H. Finnemann, "Parallel conjugate gradient algorithms for solving the neutron diffusion equation," inInt. Conf. on Supercomputing, Cologne, June 1991, pp. 163-172.
[5] R. A. Becker, J. M. Chambers, and A. R. Wilks,The New S Language, a Programming Environment for Data Analysis and Graphics. Pacific Grove, CA: Wadsworth&Brooks/Cole Advanced Books&Software, 1988.
[6] F. Biairdi, N. D. Francesco, and G. Vaglini, "Development of a debugger for a concurrent language,"IEEE Trans. Software Eng., vol. SE-12, pp. 547-553, Apr. 1986.
[7] T. Bemmerl, R. Lindhof, and T. Treml, "The distributed monitor system of TOPSYS," inProc. CONPAR 90-VAPP IV, Joint Int. Conf. on Vector and Parallel Processing, H. Burkhart, Ed. Zürich, Switzerland, September 1990, pp. 756-764.
[8] H. Burkhart and R. Millen, "Performance measurement tools in a multiprocessor environment,"IEEE Trans. Comput., vol. 38, no. 5, pp. 725-737, May 1989.
[9] P. C. Bates and J. C. Wileden, Eds., "A basis for distributed system debugging tools," inHawaii Int. Conf. on Syst. Sci. 15, Hawaii, 1982.
[10] G. Chiola, "GreatSPN 1.5 software architecture," inProc. 5th Int. Conf. on Modelling Techn. and Tools for Comput. Perform. Eval., G. Balbo, Ed. New York: Elsevier Science Publisher B. V., 1991, pp. 117-132.
[11] A. Duda, G. Harrus, Y. Haddad, and G. Bernard, "Estimating global time in distributed systems," inDistrib. Syst., Proc. 7th Int. Conf., Berlin, Sept. 1987.
[12] P. Dauphin, F. Hartleb, M. Kienow, V. Mertsiotakis, and A. Quick, "PEPP: Performance evaluation of parallel programs--User's guide--Version 3.1," Tech. Rep. 5/92, Universität Erlangen-Nürnberg, IMMD VII, Apr. 1992.
[13] O. Endriss, M. Steinbrunn, and M. Zitterbart, "NETMON-II, A monitoring tool for distributed and multiprocessor systems," inProc. 4th Int Conf. on Data Communication and Their Performance, Barcelona, Spain, June 1990.
[14] D. Ferrari, "Considerations on the insularity of performance evaluation,"IEEE Trans. Software Eng., vol. SE-12, no. 6, June 1986.
[15] D. Ferrari, G. Serazzi, and A. Zeigner,Measurement and Tuning of Computer Systems. Englewood Cliffs, NJ: Prentice-Hall, 1983.
[16] F. M. Gardner,Phaselock Techniques, 2nd ed. New York: Wiley, 1979.
[17] K. Gallivan, W. Jalby, and H. Wijshoff, "Some basic performance measurements of the 16×16 CEDAR configuration," Tech. Rep. 1146, Center for Supercomputing Res. and Develop., Urbana, IL, Aug. 1991.
[18] A. S. Glassner,An Introduction to Ray Tracing. New York: Academic Press, 1989.
[19] A. A. Hough and J. E. Cuny, "Initial experiences with a pattern-oriented parallel debugger,"ACM Sigplan Notices, Workshop on Parallel and Distrib. Debugging, vol. 24, no. 1, 195-205, Jan. 1989.
[20] U. Herzog, "Performance evaluation and formal description," inAdvanced Computer Technology, Reliable Syst. and Applications, Proc., V. A. Monaco and R. Negrini, Eds., Bologna, May 1991, pp. 750-7551.
[21] D. Helmbold and D. Luckham, "Debugging Ada tasking programs,"IEEE Software, vol. 2, no. 2, pp. 47-57, 1985.
[22] C.A.R. Hoare,Communicating Sequential Processes, Prentice Hall, Englewood, N.J., 1985.
[23] J. Joyce, G. Lomow, K. Slind, and B. Unger, "Monitoring distributed systems,"ACM Trans. Comput. Syst., vol. 5, no. 2, pp. 121- 150, May 1987.
[24] R. Klar and N. Luttenberger, "VLSI-based monitoring of the interprocess-communication of multi-microcomputer systems with shared memory," inProc. EUROMICRO '86, Microprocessing and Microprogramming, Venice, Italy, vol. 18, no. 1-5, Dec. 1986, pp. 195-204.
[25] L. Kleinrock,Queueing Systems, vol. 1: Theory. New York: John Wiley, 1975.
[26] H. Kobayashi,Modeling and Analysis: An Introduction to System Performance Evaluation Methodology. Reading, MA: Addison-Wesley, Oct. 1981.
[27] R. Klar, A. Quick, and F. Sötz, "Tools for a model-driven instrumentation for monitoring," inProceedings of the 5th International Conference on Modelling Techniques and Tools for Computer Performance Evaluation, G. Balbo, Ed. New York: Elsevier Science, 1992, pp. 165-180.
[28] L. Lamport, "Time, clocks, and the ordering of events in a distributed system,"Commun. ACM, vol. 21, no. 7, pp. 558-565, July 1978.
[29] N. Luttenberger and R. V. Stieglitz, "Performance evaluation of a communication subsystem prototype for broadband--ISDN," inProc. 2nd Workshop on Future Trends of Distrib. Computing Syst. in the 1990's, Cairo, 1990.
[30] A. D. Malony, "Multiprocessor instrumentation: approaches for CEDAR," inInstrumentation for Future Parallel Computing Syst., M. Simmons, R. Koskela, and I. Bucher, Eds. New York: Addison-Wesley, ACM Press, Frontier Series, 1989, ch. 1, pp. 1-33.
[31] B. P. Miller, M. Clark, J. Hollingsworth, S. Kierstead, S.-S. Lim, and T. Torzewski, "IPS-2: The second generation of a parallel program measurement system,"IEEE Trans. Parallel Distrib. Syst., vol. 1, no. 2, pp. 206-217, Apr. 1990.
[32] A. Mink, R. Carpenter, G. Nacht, and J. Roberts, "Multiprocessor performance-measurement instrumentation,"Comput., vol. 23, no. 9, pp. 63-75, Sept. 1990.
[33] B. P. Miller, C. Macrander, and S. Sechrest, "A distributed programs monitor for Berkeley UNIX,"Software--Practice and Experience, vol. 16, no. 2, pp. 183-200, Feb. 1986.
[34] B. Mohr, "SIMPLE: A performance evaluation tool environment for parallel and distributed systems," inDistrib. Memory Computing, 2nd European Conference, EDMCC2, A. Bode, Ed., Munich, Germany, Berlin: Springer, LNCS 487, Apr. 1991, pp. 80-89.
[35] G. J. Nutt, "Tutorial: Computer system monitors,"IEEE Comput., vol. 8, no. 11, pp. 51-61, Nov. 1975.
[36] C.-W. Oehlrich and A. Quick, "Performance evaluation of a communication system for transputer-networks based on monitored event traces,"ACM SIGARCH, vol. 19, no. 3, pp. 202-211, May 1991. Also inProc. 18th Int. Symp. on Comput. Architecture, Toronto, ON, Canada, May 27-30, 1991.
[37] D. A. Reed, R. A. Aydt, T. M. Madhyastha, R. J. Noe, K. A. Shields, and B. W. Schwartz, "An overview of the Pablo performance analysis environment," Tech. Rep., Univ. of Illinois, Urbana, Nov. 1992.
[38] M. H. Reilly,A Performance Monitor for Parallel Programs. San Diego, CA: Academic Press, 1990.
[39] K. Schimek, "Modellierung eines Kommunikationssystems für Transputernetzwerke," Master's thesis, Universität Erlangen-Nürnberg, IMMD VII, Oct. 1991.
[40] R. A. Sahner and K. S. Trivedi, "Performance and reliability analysis using directed acyclic graphs,"IEEE Trans. Software Eng., vol. SE-13, pp. 1105-1114, Oct. 1987.
[41] K. Solchenbach and U. Trottenberg, "SUPRENUM: System essentials and grid applications,"Parallel Computing. Amsterdam: North-Holland, 1988, vol. 7, pp. 265-281.
[42] J. J. P. Tsai, K. Fang, and H. Chen, "A noninvasive architecture to monitor real-time distributed systems,"Comput., vol. 23, no. 3, pp. 11-23, Mar. 1990.
[43] S. Utter, "Birds-of-a-feather session on standardizing parallel trace formats at Supercomputing '90,"Private communication, 1990.
[44] D. Wybranietz and D. Haban, "Monitoring and measuring distributed systems," inPerformance Instrumentation and Visualization, M. Simmons and R. Koskela, Eds. New York: Addison-Wesley Publishing Company, ACM Press, Frontier Series, ch. 2, 1990, pp. 27-45.

Index Terms:
Index Termsprogram debugging; distributed processing; performance evaluation; system monitoring; performance monitoring; distributed systems; functional behavior; hybrid monitoring; parallel ray tracing program; monitoring; dynamic behavior; common abstraction; debugging; tuning; SUPRENUM
R. Hofmann, R. Klar, B. Mohr, A. Quick, M. Siegle, "Distributed Performance Monitoring: Methods, Tools, and Applications," IEEE Transactions on Parallel and Distributed Systems, vol. 5, no. 6, pp. 585-598, June 1994, doi:10.1109/71.285605
Usage of this product signifies your acceptance of the Terms of Use.