|
| This Article | ||
| ||
| Share | ||
| Bibliographic References | ||
| Add to: | ||
| | ||
| Search | ||
| ||
| ASCII Text | x | ||
| Jeffrey K. Hollingsworth, "Critical Path Profiling of Message Passing and Shared-Memory Programs," IEEE Transactions on Parallel and Distributed Systems, vol. 9, no. 10, pp. 1029-1040, October, 1998. | |||
| BibTex | x | ||
| @article{ 10.1109/71.730530, author = {Jeffrey K. Hollingsworth}, title = {Critical Path Profiling of Message Passing and Shared-Memory Programs}, journal ={IEEE Transactions on Parallel and Distributed Systems}, volume = {9}, number = {10}, issn = {1045-9219}, year = {1998}, pages = {1029-1040}, doi = {http://doi.ieeecomputersociety.org/10.1109/71.730530}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, } | |||
| RefWorks Procite/RefMan/Endnote | x | ||
| TY - JOUR JO - IEEE Transactions on Parallel and Distributed Systems TI - Critical Path Profiling of Message Passing and Shared-Memory Programs IS - 10 SN - 1045-9219 SP1029 EP1040 EPD - 1029-1040 A1 - Jeffrey K. Hollingsworth, PY - 1998 KW - Parallel and distributed processing KW - measurement KW - tools KW - program tuning KW - on-line evaluation. VL - 9 JA - IEEE Transactions on Parallel and Distributed Systems ER - | |||
Abstract—In this paper, we introduce a runtime, nontrace-based algorithm to compute the critical path profile of the execution of message passing and shared-memory parallel programs. Our algorithm permits starting or stopping the critical path computation during program execution and reporting intermediate values. We also present an online algorithm to compute a variant of critical path, called critical path zeroing, that measures the reduction in application execution time that improving a selected procedure will have. Finally, we present a brief case study to quantify the runtime overhead of our algorithm and to show that online critical path profiling can be used to find program bottlenecks.
[1] DEC, DECchip 21064 and DECchip21064A Alpha AXP Microproc-essors—Hardware Reference Manual, EC-Q9ZUA-TE, June 1994.
[2] G. Ammons, T. Ball, and J. Larus, “Exploiting Hardware Performance Counters with Flow and Context Sensitive Profiling,” Proc. ACM SIGPLAN 97 Conf. Programming Language Design and Implementation, June 1997.
[3] J. Anderson et al., "Continuous Profiling: Where Have All the Cycles Gone?" Proc. 16th ACM Symp. on Operating System Principles, ACM Press, New York, 1997, pp. 1-14.
[4] D.F. Bacon and R.E. Strom, "Optimistic Parallelization of Communicating Sequential Processes," Proc. SIGPLAN '91 Symp. Principals and Practice of Parallel Programming, pp. 155-166,Williamsburg, Va.,21-24 Apr. 1991.
[5] K. Bryan, "A Numerical Method for the Circulation of the World Ocean," J. Computational Physics, vol. 4, no. 1, pp. 347-376, 1969.
[6] K.M. Chandy and L. Lamport, "Distributed Snapshots: Determining Global States of Distributed Systems," ACM Trans. Computer Systems, Feb. 1985.
[7] J.-D. Choi and S.L. Min, "Race Frontier: Reproducing Data Races in Parallel-Program Debugging," Proc. SIGPLAN '91 Symp. Principals and Practice of Parallel Programming, pp. 145-154,Williamsburg, Va., Apr.21-24 1991.
[8] R. Cypher and E. Leu, "Eficient Race Detection for Message-Passing Programs With Nonblocking Sends and Receives," Proc. IEEE Symp. Parallel and Distributed Processing (SPDP), pp. 534-541.
[9] A. Geist, A. Beguelin, J. Dongarra, W. Jiang, R. Manchek,, and V. Sunderam,PVM: Parallel Virtual Machine—A Users' Guide and Tutorial for Networked Parallel Computing. The MIT Press, 1994.
[10] A.P. Goldberg, A. Gopal, A. Lowry, and R. Strom, "Restoring Consistent Global State of Distributed Computations," Proc. ACM/ONR Workshop Parallel and Distributed Debugging, pp. 144-154,Santa Cruz, Calif., May20-21 1991.
[11] J.K. Hollingsworth and B.P. Miller, "Parallel Program Performance Metrics: A Comparison and Validation," Proc. Supercomputing 1992, pp. 4-13,Minneapolis, Minn., Nov. 1992.
[12] J.K. Hollingsworth and B.P. Miller, "Using Cost to Control Instrumentation Overhead," Theoretical Computer Science, pp. 241-258, Apr. 1998.
[13] R. Hood, K. Kennedy, and J. Mellor-Chrummey, "Parallel Program Debugging With On-the-Fly Anomaly Detection," Proc. Supercomputing 1990, pp. 78-81,New York, Nov. 1990.
[14] L. Lamport, "Time, clocks and the ordering of events in a distributed system," Comm. ACM, vol. 21, no. 7, pp. 558-565, July 1978.
[15] A.D. Malony, "Event-Based Performance Perturbation: A Case Study," Proc. 1991 ACM SIGPLAN Symp. Principals and Practice of Parallel Programming, pp. 201-212,Williamsburg, Va., Apr.21-24 1991.
[16] M. Martonosi, A. Gupta, and T. Anderson, "MemSpy: Analyzing Memory System Bottlenecks in Programs," Proc. 1992 SIGMETRICS Conf. Measurement and Modeling of Computer Systems, pp. 1-12,Newport, R.I., June1-5 1992.
[17] T. Mathisen, "Pentium Secrets," Byte, vol. 19, no. 7, pp. 191-192, 1994.
[18] B.P. Miller, M.D. Callaghan, J.M. Cargille, J.K. Hollingsworth, R.B. Irvin, K.L. Karavanic, K. Kunchithapadam, and T. Newhall, “The Paradyn Parallel Performance Measurement Tools,” IEEE Computer, vol. 28, no. 11, Nov. 1995. Also see.
[19] B.P. Miller and J.-D. Choi, "A Mechanism for Efficient De-bugging of Parallel Programs," Proc. SIGPLAN/SIGOPS Workshop on Parallel and Distributed Debugging, pp. 141-150,Madison, Wis., May5-6 1988.
[20] B.P. Miller et al., “IPS-2: The Second Generation of a Parallel Program Measurement System,” IEEE Trans. Parallel Distributed Systems, Vol. 1, No. 2, Apr. 1990, pp. 206-217.
[21] R.H.B. Netzer and B.P. Miller, "What Are Race Conditions? Some Issues of Formalizations," ACM Letters on Programming Languages and Systems, vol. 1, no. 1, pp. 74-88, 1991.
[22] R.H.B. Netzer and J. Xu, "Adaptive Message Logging for Incremental Replay of Message-Passing Programs," Proc. Supercomputing 1993, pp. 840-849,Portland, Ore., 1993.
[23] D.A. Reed, R.A. Aydt, R.J. Noe, P.C. Roth, K.A. Shields, B.W. Schwartz, and L.F. Tavera, Scalable Performance Analysis: The Pablo Performance Analysis Environment, in Scalable Parallel Libraries Conference, A. Skjellum, ed. Los Alamitos, Calif.: IEEE CS Press, 1993.
[24] R. Title, "Connection Machine Debugging and Performance Analysis: Present and Future," ACM/ONR Workshop Parallel and Distributed Debugging, pp. 272-275,Santa Cruz, Calif., May20-21 1991.
[25] D.J. Webb personal communication, 1996.
[26] S.C. Woo et al., "The SPLASH-2 Programs: Characterization and Methodological Considerations," Proc. 22nd Annual Int'l Symp. Computer Architecture, IEEE CS Press, Los Alamitos, Calif., June 1995, pp. 24-36.
[27] C.-Q. Yang and B.P. Miller, "Critical Path Analysis for the Execution of Parallel and Distributed Programs," Proc. Eighth Int'l Conf. Distributed Computing Systems, pp. 366-375,San Jose, Calif., June 1988.
[28] M. Zagha, B. Larson, S. Turner, and M. Itzkowitz, "Performance Analysis Using the MIPS R10000 Performance Counters," Proc. Supercomputing '96,Pittsburgh, Pa., Nov. 1996.

