This Article 
 Bibliographic References 
 Add to: 
Mtool: An Integrated System for Performance Debugging Shared Memory Multiprocessor Applications
January 1993 (vol. 4 no. 1)
pp. 28-40

The authors describe Mtool, a software tool for analyzing performance losses in shared memory parallel programs. Mtool augments a program with low overhead instrumentation which perturbs the program's execution as little as possible while generating enough information to isolate memory and synchronization bottlenecks. After running the instrumented version of the parallel program, the programmer can use Mtool's window-based user interface to view compute time, memory, and synchronization objects. The authors describe Mtool's low overhead instrumentation methods, memory bottleneck detection technique, and attention focusing mechanisms, contrast Mtool with other approaches, and offer a case study to demonstrate its effectiveness.

[1] T.E. Anderson and E.D. Lazowska, "Quartz: A Tool for Tuning Parallel Program Performance,"Proc. 1990 ACM SIGMetrics Conf. Measurement and Modeling of Computer Systems, ACM Press, New York, 1990, pp. 115-125.
[2] Z. Aral and I. Gertner, "Non-intrusive and interactive profiling in parasight," inProc. 1988 ACM/SIGPLAN PPEALS, 1988, pp. 21-30.
[3] D. Bernstein, A. Bolmarchich, and K. So. "Performance visualization of parallel programs on a shared memory multiprocessor." inProc. ICPP,vol. II, Aug. 1989, pp. 1-10.
[4] E. Lusk, R. Overbeek,et al., Portable Programs for Parallel Processors. New York: Holt, Rinehart, and Winston, 1987.
[5] H. Burkhart and R. Millen, "Performance-measurement tools in a multiprocessor environment,"IEEE Trans. Comput., vol. 38, no. 5. pp. 725-737, 1989.
[6] D. Callahan, K. Kennedy, and A. Porterfield, "Analyzing and visualizing performance of memory hierarchies," inPerformance Instrumentation and Visualization, M. Simmons and R. Koskela, Eds. Reading, MA: Addison Wesley, 1990, pp. 1-26.
[7] H. Davis and J. Hennessy, "Characterizing the synchronization behavior of parallel programs," inProc. 1988 ACM/SIGPLAN PPEALS, 1988, pp. 198-211.
[8] J. Dongarra et al., "A Tool to Aid in the Design, Implementation, and Understanding of Matrix Algorithms for Parallel Processors,"J. Parallel and Distributed Computing, Vol. 9, No. 2, June 1990, pp. 185-202.
[9] K. Gallivan, W. Jalby, U. Meier, and A. Sameh, "The impact of hierarchical memory systems on linear algebra algorithm design," Int.J. Supercomput. Appl., vol. 1. no. 2, 1988.
[10] R. Glenn and D. Pryor, "Instrumentation for a massively parallel MIMD application,"J. Parallel Distributed Comput., vol. 12, no. 3, pp. 223-236, 1991.
[11] A. Goldberg, "Reducing overhead in counter-based execution profiling," Tech. Rep. CSL-TR-91-495, Stanford Comput. Syst. Lab., 1991.
[12] A. Goldberg, "Multiprocessor performance debugging and memory bottlenecks," Ph.D. dissertation, Stanford Univ., June 1992.
[13] A. Goldberg and J. Hennessy, "MTOOL: A method for isolating memory bottlenecks in shared memory multiprocessor programs," inProc. ICPP, vol. II, Aug. 1991.
[14] S.L. Graham, P.B. Kessler, and M.K. McKusick, "Gprof: A Call Graph Execution Profiler,"Proc. SIGPlan '82 Symp. Compiler Construction, ACM Press, New York, 1982, pp. 120-126.
[15] D. E. Knuth,The Art of Computer Programming, Vol. 1. Reading, MA: Addison-Wesley, 1973.
[16] P. Koujianou, "Imperfect competition in international markets," Ph.D. Dep. Economics, Stanford Univ., May 1992.
[17] A. Malony, "Performance observability," Ph.D. dissertation, Dep. Comput. Sci., Univ. of Illinois at Urbana-Champaign, Sept. 1990.
[18] B. Milleret al., "IPS-2: The second generation of a parallel program measurement system,"IEEE Trans. Parallel Distributed Syst., vol. 1, no. 2, pp. 206-217, 1990.
[19] T. Mowry and A. Gupta, "Tolerating Latency Through Software in Shared-Memory Multiprocessors,"J. Parallel and Distributed Computing, Vol. 12, No. 6, June 1991, pp. 87-106.
[20] E. Rothberg and A. Gupta, "Techniques for Improving the Performance of Sparse Matrix Factorization on Multiprocessor Workstations,"Proc. Supercomputing 90, IEEE CS Press, Los Alamitos, Calif., Order No. 2056 (microfiche only), 1990, pp. 232-243.
[21] A. D. Samples, "Profile-driven compilation," Ph.D. dissertation, Univ. California at Berkeley, Apr. 1991.
[22] V. Sarkar, "Determining average program execution times and their variance," inSigplan Conf. Programming Language Design and Implementation, 1989, pp. 298-312.
[23] P. J. Weinberger, "Cheap dynamic instruction counting,"AT&T Bell Labs. Tech. J., vol. 63, no. 8, pp. 1815-1826, 1984.

Index Terms:
Index Termsperformance losses analysis; integrated system; performance debugging; shared memorymultiprocessor applications; Mtool; software tool; shared memory parallel programs; lowoverhead instrumentation; synchronization bottlenecks; window-based user interface;compute time; synchronization objects; memory bottleneck detection; parallelprogramming; performance evaluation; program debugging; shared memory systems;software tools
A.J. Goldberg, J.L. Hennessy, "Mtool: An Integrated System for Performance Debugging Shared Memory Multiprocessor Applications," IEEE Transactions on Parallel and Distributed Systems, vol. 4, no. 1, pp. 28-40, Jan. 1993, doi:10.1109/71.205651
Usage of this product signifies your acceptance of the Terms of Use.