This Article 
 Bibliographic References 
 Add to: 
Concurrent Exception Handling and Resolution in Distributed Object Systems
October 2000 (vol. 11 no. 10)
pp. 1019-1032

Abstract—We address the problem of how to handle exceptions in distributed object systems. In a distributed computing environment, exceptions may be raised simultaneously in different processing nodes and thus need to be treated in a coordinated manner. Mishandling concurrent exceptions can lead to catastrophic consequences. We take two kinds of concurrency into account: 1) Several objects are designed collectively and invoked concurrently to achieve a global goal and 2) multiple objects (or object groups) that are designed independently compete for the same system resources. We propose a new distributed algorithm for resolving concurrent exceptions and show that the algorithm works correctly even in complex nested situations, and is an improvement over previous proposals in that it requires only O(nmaxN 2) messages, thereby permitting quicker response to exceptions.

[1] Information Technology—Programming Langauages—Ada. Language and Standard Libraries, ISO/IEC 8652:1995(E), Intermetrics, Inc., 1995.
[2] C. Atkinson, Object-Oriented Reuse, Concurrency and Distribution. Addison-Wesley, 1991.
[3] A. Avizienis, “TheN-Version Approach to Fault-Tolerant Software,” IEEE Trans. Software Eng., vol. 11, no. 12, pp. 1,491-1,501, Dec. 1985.
[4] R. Balter, S. Lacourte, and M. Riveill, “The Guide Language,” Computer J., vol. 37, no. 6, pp. 521-530, 1994.
[5] R.H. Campbell and B. Randell, “Error Recovery in Asynchronous Systems,” IEEE Trans. Software Eng., vol. 12, no. 8, pp. 811-826, Aug. 1986.
[6] S. Chiba and T. Masuda, “Designing an Extensible Distributed Language with a Meta-Level Architecture,” Proc. Seventh European Conf. Object-Oriented Programming, O. Nierstraz, ed., pp. 482–501, July 1993.
[7] F. Cristian, "Understanding Fault-Tolerant Distributed Systems," Comm. ACM, vol. 34, no. 2, Feb. 1991.
[8] F. Cristian, “Exception Handling and Tolerance of Software Faults,” Software Fault Tolerance, M. Lyu, pp. 81-107, Wiley, 1995.
[9] Q. Cui and J. Gannon, “Data-Oriented Exception Handling,” IEEE Trans. Software Eng., Vol.18, No. 5, pp. 393–401, May 1992.
[10] C. Dony, “Exception Handling and Object-Oriented Programming: Towards a Synthesis,” Proc. ACM Conf. Object-Oriented Programming Systems, Languages, and Applications, pp. 322–330, Oct. 1990.
[11] J.C. Fabre, V. Nicomette, T. Perennou, R.J. Stroud, and Z. Wu, Implementing Fault Tolerant Applications Using Reflective Object-Oriented Programming Proc 25th IEEE Int'l Symp. Fault-Tolerant Computing (FTCS-25), pp. 489-498, 1995.
[12] V. Issarny, “An Exception Handling Mechanism for Parallel Object-Oriented Programming: Towards Reusable, Robust Distributed Software,” J. Object-Oriented Programming, vol. 6, no. 6, pp. 29-40, 1993.
[13] P. Jalote, “Using Broadcast for Multiprocess Recovery,” Proc. Sixth Distributed Computing Systems Symp., pp. 582-589, 1986.
[14] P.A. Lee and T. Anderson, Fault Tolerance: Principles and Practice, second ed. Vienna, Austria: Springer–Verlag, 1990.
[15] C. Lewerentz and T. Lindner, Formal Development of Reactive Systems: Case Study“Production Cell.” Springer-Verlag, 1995.
[16] L. Liang, S. Chanson, and G. Neufeld, “Process Groups and Group Communications: Classifications and Requirements,” Computer, Vol. 23, No. 2, Feb. 1990, pp. 56‐66.
[17] A. Lötzbeyer, “Task Description of a Fault-Tolerant Production Cell,” version 1.6, available pub/papers/compgeo prost/projects/korsyskorsys.html, 1996.
[18] A. Lötzbeyer and R. Mühlfeld, “Task Description of a Flexible Production Cell with Real Time Properties,” Forschungszentrum Informatik, Karlsruhe, Germany ( ), 1996.
[19] N.A. Lynch, M. Merrit, W.E. Wehil, and A. Fekete, Atomic Transactions. Morgan Kaufmann, 1993.
[20] Software Fault Tolerance, M.R. Lyu, ed. Wiley, 1995.
[21] P. Maes, "Concepts and Experiments in Computational Reflection," Proc. OOPSLA '87, pp. 147-155,Orlando, Fla., 1987.
[22] B. Meyer, Eiffel: The Language, Prentice Hall, Upper Saddle River, N.J., 1992.
[23] L. Paulson, ML for the Working Programmer, Cambridge Univ. Press, Cambridge, UK, 1996.
[24] B. Randell, “System Structure for Software Fault Tolerance,” IEEE Trans. Software Eng., vol. 1, no. 2, pp. 220-232, 1975.
[25] A. Romanovsky, I. Shturtz, and V. Vassilyev, “Designing Fault-Tolerant Objects in Object-Oriented Programming,” Proc. Seventh Int'l Conf. Technology of Object-Oriented Languages and Systems (TOOLS EUROPE 92), pp. 199-205, 1992.
[26] A. Romanovsky, J. Xu, and B. Randell, “Exception Handling and Resolution in Distributed Object-Oriented Systems,” Proc. 16th IEEE Int'l Conf. Distributed Computing Systems, pp. 545-552, May 1996.
[27] A. Romanovsky, J. Xu, and B. Randell, “Coordinated Exception Handling in Real-Time Distributed Object Systems,” Int'l J. Computer Systems Science and Eng., special issue on object-oriented real-time distributed systems, vol. 14, no. 4, pp. 197-208, 1999.
[28] C.M.F. Rubira, “Structuring Fault-Tolerant Object-Oriented Systems Using Inheritance and Delegation,” PhD thesis, Univ. of Newcastle upon Tyne, 1994.
[29] J. Xu, B. Randell, A. Romanovsky, C. Rubira, R.J. Stroud, and Z. Wu, “Fault Tolerance in Concurrent Object-Oriented Software through Coordinated Error Recovery,” Proc. 25th Int'l Symp. Fault-Tolerant Computing, pp. 499-508, June 1995.
[30] J. Xu, B. Randell, C.M.F. Rubira-Calsavara, and R.J. Stroud, “Toward an Object-Oriented Approach to Software Fault Tolerance,” Recent Advances in Fault-Tolerant Parallel and Distributed Systems, D.K. Pradhan and D.R. Avresky, eds., IEEE CS Press, pp. 226-233, Sept. 1995.
[31] J. Xu, B. Randell, A. Romanovsky, R.J. Stroud, A.F. Zorzo, E. Canver, and F. von Henke, “Rigorous Development of a Safety-Critical System Based on Coordinated Atomic Actions,” Proc. 29th Int'l Symp. Fault-Tolerant Computing, pp. 68-75, June 1999.
[32] S.M. Yang and K.H. Kim, “Implementation of the Conversation Scheme in Message-Based Distributed Computer Systems,” IEEE Trans. Parallel and Distributed Systems, vol. 3, no. 5, pp. 555-572, Sept. 1992.

Index Terms:
Concurrent exeception handling, distributed systems, exception resolution, nested atomic actions, object-oriented programming.
Jie Xu, Alexander Romanovsky, Brian Randell, "Concurrent Exception Handling and Resolution in Distributed Object Systems," IEEE Transactions on Parallel and Distributed Systems, vol. 11, no. 10, pp. 1019-1032, Oct. 2000, doi:10.1109/71.888642
Usage of this product signifies your acceptance of the Terms of Use.