Parallel and Distributed Processing, IEEE Symposium on (1994)
Dallas, TX, USA
Oct. 26, 1994 to Oct. 29, 1994
Gerstel , Dept. of Comput. Sci., Israel Inst. of Technol., Haifa, Israel
Zaks , Dept. of Comput. Sci., Israel Inst. of Technol., Haifa, Israel
This paper presents a practical paradigm, called on-the-fly replay. This paradigm consists of running a distributed program twice at the same time: an original computation is running in a regular fashion, which also includes steps of making non-deterministic choices; this execution is driving a twin execution, whose non-deterministic choices do not have to be evaluated (since they are taken from the original computation). This paradigm has several interesting uses. Among them, distributed debugging is particularly noteworthy. The integration of this paradigm into a distributed debugging facility, called EREBUS, is described. This implementation was run on a distributed memory parallel machine (Intel Hypercube iPSC2) and experimental results are described, that demonstrate the advantage of this paradigm.
on-the-fly replay, practical paradigm, distributed debugging, distributed program, EREBUS, distributed debugging facility, distributed memory parallel machine
Gerstel, Raynal, Zaks, Hurfin and Plouzeau, "On-the-fly replay: a practical paradigm and its implementation for distributed debugging," Parallel and Distributed Processing, IEEE Symposium on(SPDP), Dallas, TX, USA, 1994, pp. 266-272.