Issue No.09 - Sept. (1986 vol.12)
David J. Holding , Department of Electrical and Electronic Engineering and Applied Physics, Aston University, Birmingham B4 7ET, England
A fundamental problem in the design of error detection and recovery mechanisms for networks of cooperating asynchronous processes is the prevention of error propagation through process interaction. The recovery procedure must be a cooperative effort involving all the interactive processes and may be limited to bounded parts of the system by the conversation mechanism proposed by Randell. This paper examines the problems of error detection and recovery in a number of concurrent processes expressed as a set of communicating sequential processes (C.S.P). A method is proposed which uses a Petri net model to identify formally both the state and the state reachability tree of a distributed system. These are used to define systematically the boundaries of a conversation including the recovery and test lines which are essential parts of the fault-tolerant mechanism. The method can be used as a design tool to determine a single conversation or a set of properly nested conversations. The technique can be used to identify the full set of processes enclosed within a particular conversation, or to design a conversation which will protect a specific functional aspect of a distributed system. The techniques described in this paper are implemented using the occam programming language, which is derived from C.S.P. The application of this method is shown by a control example.
Process control, Robot kinematics, Petri nets, Software, Computer languages, Synchronous motors, recovery block, Communicating sequential processes, concurrent processes, conversation, distributed systems, fault-tolerant software, occam, Petri nets
David J. Holding, "Design of reliable software in distributed systems using the conversation scheme", IEEE Transactions on Software Engineering, vol.12, no. 9, pp. 921-928, Sept. 1986, doi:10.1109/TSE.1986.6313047