14th International Conference on Distributed Computing Systems (1994)
June 21, 1994 to June 24, 1994
R.H.B. Netzer , Dept. of Comput. Sci., Brown Univ., Providence, RI, USA
S. Subramanian , Dept. of Comput. Sci., Brown Univ., Providence, RI, USA
Jian Xu , Dept. of Comput. Sci., Brown Univ., Providence, RI, USA
Debugging long-running, nondeterministic message-passing parallel programs requires incremental replay, the ability to exactly replay selected parts of an execution. To support incremental replay, we must log enough messages and checkpoint processes often enough to allow any requested replay to complete quickly. We present an adaptive tracing strategy to keep the message-logging overhead down. We let the user specify a bound on the maximum time any replay request is allowed to take. Our algorithm tracks what each process's critical path will be during a replay and logs enough messages to ensure the critical path will never exceed the bound. Overhead is kept low by not logging messages that can be recomputed during a replay. Experiments indicate that we log about 0.1-5% of the messages while still providing a reasonable bound on any replay.<
message passing, parallel programming, program debugging, critical path analysis, data recording
R. Netzer, S. Subramanian and Jian Xu, "Critical-path-based message logging for incremental replay of message-passing programs," 14th International Conference on Distributed Computing Systems(ICDCS), Pozman, Poland, 1994, pp. 404-413.