This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
CSP Methods for Identifying Atomic Actions in the Design of Fault Tolerant Concurrent Systems
July 1995 (vol. 21 no. 7)
pp. 629-639
Limiting the extent of error propagation when faults occur and localizing the subsequent error recovery are common concerns in the design of fault tolerant parallel processing systems. Both activities are made easier if the designer associates fault tolerance mechanisms with the underlying atomic actions of the system. With this in mind, this paper has investigated two methods for the identification of atomic actions in parallel processing systems described using CSP. Explicit trace evaluation forms the basis of the first algorithm, which enables a designer to analyze interprocess communications and thereby locate atomic action boundaries in a hierarchical fashion. The second method takes CSP descriptions of the parallel processes and uses structural arguments to infer the atomic action boundaries. This method avoids the difficulties involved with producing full trace sets, but does incur the penalty of a more complex algorithm.

[1] MOD(UK) Interim Defence Standards 00-55 and 00-56, no. 1, Apr. 1991.
[2] “Software considerations in airborne systems and equipment certification,” RTCA/D, 178A, RTCA, Washington, DC, 1985.
[3] “Software for computers in the application of industrial safety-related systems,” IEC draft standard 65A (Secretariat) 94, Document 89/33006, BSI, 1989.
[4] A. Avizienis, and J.P.J. Kelly,“Fault tolerance by design diversity: Concepts and experiments,” IEEE Computer, vol. 17, no. 8, pp. 67-80, Aug. 1984.
[5] P.A. Lee, and T. Anderson,Fault Tolerance: Principles and Practice. Springer Verlag, 1991.
[6] B. Liskov and R. Scheifler, “Guardians and Actions: Linguistic Support for Robust, Distributed Programs,” ACM Trans. Programming Languages and Systems, vol. 5, no. 3, pp. 381-404, July 1983.
[7] P. Jalote and R.H. Campbell,“Atomic actions for fault tolerance using CSP,” IEEE Trans. Software Engineering, vol. 12, no. 1, pp. 59-68, Jan. 1986.
[8] T. Anderson and J.C. Knight,“A framework for software fault tolerance in real-time systems,” IEEE Trans. Software Engineering, vol. 9, no. 12, pp. 355-364, May 1983.
[9] C.A.R. Hoare, Communicating Sequential Processes, Prentice Hall, Englewood Cliffs, N.J., 1985.
[10] L.V. Mancini, and S.K. Shrivastava,“Replication within atomic actions and conversations: A case study infault-tolerance duality,” FTCS-19,Chicago, pp. 454-461, June 1988.
[11] B. Randell,“System structure for software fault tolerance,” IEEE Trans. Software Engineering, vol. 1, pp. 220-232, June 1975.
[12] A. Avizienis,“The N-version approach to fault-tolerant software,” IEEE Trans. Software Engineering, vol. 11, no. 12, pp. 1,491-1,501, Dec. 1985.
[13] R.K. Scott, J.W. Gault, and D.F. McAllister, “Fault-Tolerant Reliability Modeling,” IEEE Trans. Software Eng., vol. 13, no. 5, pp. 582-592, May 1987.
[14] K.H. Kim, and H.O. Welch,“Distributed execution of recovery blocks: An approach for uniform treatment of hardware and software faults in real-time applications,” IEEE Trans. Computers, vol. 38, no. 5, pp. 626-636, May 1989.
[15] E. Best, and B. Randell,“A formal model of atomicity in asynchronous systems,” Acta Informatica, vol 16, pp. 93-124, 1981.
[16] K.H. Kim,S.M. Yang,, and M.H. Kim,“Implementation of concurrent programming language facilities supporting conversation structuring,” Proc. IEEE COMPSAC’85, pp. 445-453, 1985.
[17] K.H. Kim, “Programmer Transparent Coordination of Recovering Concurrent Processes: Philosophy and Rules of Efficient Implementation,” IEEE Trans. Software Eng., vol. 14, no. 6, pp. 810-821, June 1988.
[18] K.H. Kim and S.M. Yang,“Performance impact of look-ahead execution the conversation scheme,” IEEE Trans. Computers, vol. 38, no. 8, pp. 118-1,202, Aug. 1989.
[19] R.H. Campbell,T. Anderson,, and B. Randell,“Practical fault tolerant software for asynchronous systems,” Proc. SAFECOM’83, Cambridge, pp. 59-65, 1983.
[20] G.F. Carpenter,“The use of Occam and Petri nets in the simulation of logic structuresfor the control of loosely coupled distributed systems,” Proc. UKSC Conference on Computer Simulation (UKSC-87), Bangor, Sept. 1987. Pub. Soc. Computer Simulation, pp. 30-31, Sept. 1987.
[21] G.F. Carpenter and A.M. Tyrrell,“The use of GMB in the design of robust software for distributed systems,” Software Engineering J., vol. 4, pp. 268-282, Sept. 1989.
[22] J.L. Peterson, Petri Net Theory and the Modeling of Systems.Englewood Cliffs, N.J.: Prentice Hall, 1981.
[23] Inmos, Occam 2 Reference Manual. Prentice Hall, 1988.
[24] A.M. Tyrrell, and A.C.A. Smith,“A parallel module for fault tolerant industrial control applications,” IFAC Symp. Parallel and Distributed Computing,Greece, pp. 205-210, June 1991.
[25] A.M. Tyrrell, and I.P.W. Sillitoe,“Evaluation of fault tolerant software structures for parallel systems in industrial control,” IEE Int. Conf. CONTROL’91,Edinburgh, pp. 393-398, Mar. 1991.
[26] G.M. Reed and A.W. Roscoe, "A Timed Model for Communicating Sequential Processes," Theoretical Computer Science, vol. 58, pp. 249-261, June 1988.
[27] Z. Chaochen,“The consistency of the calculus of total correctness for communicating processes,” Oxford Univ. Research Group Monograph PRG 26, Feb. 1982.
[28] O.J. Saiz, and A.M. Tyrrell,“Analysis tool for parallel systems,” Proc. First Euromicro Int’l Workshop on Parallel and DistributedProcessing, Gran Canaria, Jan.27-29, 1993, IEEE Computer Society Press, pp. 499-505, Jan., 1993
[29] A.M. Tyrrell, and D.J. Holding,“Design of reliable software in distributed systems using the conversation scheme,” IEEE Trans. Software Engineering, vol. 12, no. 7, pp. 921-928, Sept. 1986.
[30] K.H. Kim,“Approaches to mechanization of the conversation scheme based on monitors,” IEEE Trans. Software Engineering, vol. 8, pp. 189-197, May 1982.

Index Terms:
Atomic actions, concurrent systems, CSP, fault tolerance.
Citation:
Andrew M. Tyrrell, Geof F. Carpenter, "CSP Methods for Identifying Atomic Actions in the Design of Fault Tolerant Concurrent Systems," IEEE Transactions on Software Engineering, vol. 21, no. 7, pp. 629-639, July 1995, doi:10.1109/32.392983
Usage of this product signifies your acceptance of the Terms of Use.