This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
The Guardian Model and Primitives for Exception Handling in Distributed Systems
December 2004 (vol. 30 no. 12)
pp. 1008-1022
This paper presents an abstraction called guardian for exception handling in distributed and concurrent systems that use coordinated exception handling. This model addresses two fundamental problems with distributed exception handling in a group of asynchronous processes. The first is to perform recovery when multiple exceptions are concurrently signaled. The second is to determine the correct context in which a process should execute its exception handling actions. Several schemes have been proposed in the past to address these problems. These are based on structuring a distributed program as atomic actions based on conversations or transactions and resolving multiple concurrent exceptions into a single one. The guardian in a distributed program represents the abstraction of a global exception handler, which encapsulates rules for handling concurrent exceptions and directing each process to the semantically correct context for executing its recovery actions. Its programming primitives and the underlying distributed execution model are presented here. In contrast to the existing approaches, this model is more basic and can be used to implement or enhance the existing schemes. Using several examples we illustrate the capabilities of this model. Finally, its advantages and limitations are discussed in contrast to existing approaches.

[1] K. Arnold and J. Gosling, The Java Programming Language, second ed. Addison-Wesley, 1998.
[2] N. Budhiraja, K. Marzullo, F.B. Schneider, and S. Toueg, “The Primary-Backup Approach,” Distributed Systems, S. Mullender, ed., second ed, pp. 199-216, 1993.
[3] R.H. Campbell and B. Randell, “Error Recovery in Asynchronous Systems,” IEEE Trans. Software Eng., vol. 12, no. 8, pp. 811-826, Aug. 1986.
[4] N.H. Cohen, Ada as a Second Language, second ed. McGraw-Hill, 1996.
[5] F. Cristian, “Exception Handling and Software Fault Tolerance,” IEEE Trans. Computers, vol. 31, no. 6 pp. 531-540, June 1982.
[6] F. Cristian and C. Fetzer, “The Timed Asynchronous Distributed System Model,” IEEE Trans. Parallel and Distributed Systems, vol. 10, pp. 642-657, June 1999.
[7] J.B. Goodenough, “Exception Handling: Issues and a Proposed Notation,” Comm. ACM, vol. 18, no. 12, pp. 683-696, Dec. 1975.
[8] J. Gray, “Notes on Database Operating Systems,” Operating Systems: An Advanced Course, pp. 393-481, 1978.
[9] V. Hadzilacos and S. Toueg, “A Modular Approach to Fault-Tolerant Broadcasts and Related Problems,” technical report, Univ. of Toronto, 1994.
[10] T. Haerder and A. Reuter, “Principles of Transaction-Oriented Database Recovery,” ACM Computing Surveys, vol. 15, no. 4, pp. 287-317, Dec. 1983.
[11] H. Hecht, “Rare Conditions— An Important Cause of Failures,” Proc. Eighth Ann. Conf. Computer Assurance (COMPASS), pp. 81-85, 1993.
[12] V. Issarny, “An Exception Handling Mechanism for Parallel Object-Oriented Programming: Toward Reusable, Robust Distributed Software,” J. Object Oriented Programming, vol. 6, pp. 29-40, Oct. 1993.
[13] P. Jalote and R. Campbell, “Atomic Actions for Fault-Tolerance in CSP,” IEEE Trans. Software Eng., vol. 12, no. 1, pp. 59-68, Jan. 1986.
[14] J. Kienzle, A. Romanovsky, and A. Strohmeier, “Open Multithreaded Transactions: Keeping Threads and Exceptions Under Control,” Proc. Sixth Int'l Workshop Object-Oriented Real Time Dependable Systems, pp. 197-205, Jan. 2001.
[15] M. Klein and C. Dellarocas, “Exception Handling in Agent Systems,” Proc. Third Int'l Conf. Autonomous Agents (Agents '99), 1999.
[16] P.A. Lee and T. Anderson, Fault Tolerance, Principles and Practice. Prentice Hall, 1981.
[17] B.H. Liskov and A. Snyder, “Exception Handling in CLU,” IEEE Trans. Software Eng., vol. 5, pp. 546-558, Nov. 1979.
[18] D.B. Lomet, “Process Synchronization, Communication, and Recovery Using Atomic Actions,” SIGPLAN Notices, vol. 12, no. 3, pp. 128-137, Mar. 1977.
[19] R. Miller and A. Tripathi, “Exception Handling in Timed Asynchronous Systems,” Concurrency in Dependable Systems, P. Ezhilchelvan and A. Romanovsky, eds., Kluwer, 2002.
[20] R. Miller, “The Guardian Model for Exception Handling in Distributed Systems,” PhD thesis, Univ. of Minnesota, Nov. 2002.
[21] R. Miller and A. Tripathi, “The Guardian Model for Exception Handling in Distributed Systems,” Proc. IEEE Symp. Reliable Distributed Computing, pp. 304-313, 2002.
[22] B. Randell, “System Structure for Software Fault Tolerance,” IEEE Trans. Software Eng., vol. 1, no. 2, pp. 220-232, June 1975.
[23] B. Randell, P.A. Lee, and P.C. Treleaven, “Reliability Issues in Computing System Design,” ACM Computing Surveys, vol. 10, no. 2, pp. 123-165, June 1978.
[24] A. Romanovsky, J. Xu, and B. Randell, “Exception Handling and Resolution in Distributed Object-Oriented Systems,” Proc. 16th IEEE Int'l Conf. Distributed Computing Systems, pp. 545-552, 1996.
[25] A. Romanovsky, “Looking Ahead in Atomic Actions with Exception Handling,” Proc. 20th IEEE Symp. Reliable Distributed Systems (SRDS), pp. 28-31, Oct. 2001.
[26] F. Schneider, “Implementing Fault-Tolerant Services Using the State Machine Approach,” ACM Computing Surveys, pp. 299-319, Dec. 1990.
[27] B. Stroustrup, The C++ Programming Language, second ed. Addison-Wesley, 1991.
[28] J. Xu, B. Randell, A. Romanovsky, R.J. Stroud, and A.F. Zorzo, “Rigorous Development of a Safety-Critical System Based on Coordinated Atomic Actions,” Proc. 29th Int'l Symp. Fault-Tolerant Computing, pp. 68-75, June 1999.
[29] J. Xu, B. Randell, A. Romanovsky, R.J. Stroud, A.F. Zorzo, E. Canver, and F. von Henke, “Rigorous Development of an Embedded Fault-Tolerant System Based on Coordinated Atomic Actions,” IEEE Trans. Computers, vol. 51, no. 2, pp. 164-179, Feb. 2002.
[30] J. Xu, A. Romanovsky, and B. Randell, “Concurrent Exception Handling and Resolution in Distributed Object Systems,” IEEE Trans. Parallel and Distributed Systems, vol. 11, no. 10, pp. 1019-1031, Oct. 2000.

Index Terms:
Concurrent programming, distributed programming, fault tolerance.
Citation:
Robert Miller, Anand Tripathi, "The Guardian Model and Primitives for Exception Handling in Distributed Systems," IEEE Transactions on Software Engineering, vol. 30, no. 12, pp. 1008-1022, Dec. 2004, doi:10.1109/TSE.2004.106
Usage of this product signifies your acceptance of the Terms of Use.