This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Experiences, Strategies, and Challenges in Building Fault-Tolerant CORBA Systems
May 2004 (vol. 53 no. 5)
pp. 497-511
It has been almost a decade since the earliest reliable CORBA implementation and, despite the adoption of the Fault-Tolerant CORBA (FT-CORBA) standard by the Object Management Group, CORBA is still not considered the preferred platform for building dependable distributed applications. Among the obstacles to FT-CORBA's widespread deployment are the complexity of the new standard, the lack of understanding in implementing and deploying reliable CORBA applications, and the fact that current FT--CORBA do not lend themselves readily to complex, real-world applications. In this paper, we candidly share our independent experiences as developers of two distinct reliable CORBA infrastructures (OGS and Eternal) and as contributors to the FT-CORBA standardization process. Our objective is to reveal the intricacies, challenges, and strategies in developing fault-tolerant CORBA systems, including our own. Starting with an overview of the new FT-CORBA standard, we discuss its limitations, along with techniques for best exploiting it. We reflect on the difficulties that we have encountered in building dependable CORBA systems, the solutions that we developed to address these challenges, and the lessons that we learned. Finally, we highlight some of the open issues, such as nondeterminism and partitioning, that remain to be resolved.

[1] K P. Birman, R. van Rennesse, Reliable Distributed Computing Using the Isis Toolkit. IEEE CS Press, 1994.
[2] L. Chen and A. Avizienis, N-Version Programming: A Fault-Tolerance Approach to Reliability of Software Operation Proc. Fault-Tolerant Computing Symp., pp. 3-9, July 1978.
[3] D.H. Craft, A Study of Pickling J. Object-Oriented Programming, vol. 5, no. 8, pp. 54-66, 1993.
[4] M. Cukier et al., AQuA: An Adaptive Architecture that Provides Dependable Distributed Objects Proc. IEEE Symp. Reliable Distributed Systems, pp. 245-253, Oct. 1998.
[5] J.-C. Fabre and T. Perennou, “A Metaobject Architecture for Fault-Tolerant Distributed Systems: The FRIENDS Approach,” IEEE Trans. Computers, vol. 47, no. 1, pp. 78-95, Jan. 1998.
[6] P. Felber, The CORBA Object Group Service: A Service Approach to Object Groups in CORBA PhD thesis, Swiss Federal Inst. of Technology, Lausanne, 1998.
[7] P. Felber, Lightweight Fault Tolerance in CORBA Proc. Int'l Symp. Distributed Objects and Applications (DOA '01), pp. 239-247, Sept. 2001.
[8] P. Felber, X. Défago, P. Eugster, and A. Schiper, Replicating CORBA Objects: A Marriage between Active and Passive Replication Proc. Second IFIP WG 6.1 Int'l Working Conf. Distributed Applications and Interoperable Systems (DAIS '99), pp. 375-387, June 1999.
[9] P. Felber, R. Guerraoui, and A. Schiper, The Implementation of a CORBA Object Group Service Theory and Practice of Object Systems, vol. 4, no. 2, pp. 93-105, 1998.
[10] P. Felber, B. Jai, R. Rastogi, and M. Smith, Using Semantic Knowledge of Distributed Objects to Increase Reliability and Availability Proc. Sixth Int'l Workshop Object-Oriented Real-Time Dependable Systems (WORDS '01), pp. 153-160, Jan. 2001.
[11] P. Felber and P. Narasimhan, Reconciling Replication and Transactions for the End-to-End Reliability of CORBA Applications Proc. Int'l Symp. Distributed Objects and Applications (DOA '02), pp. 737-754, Oct. 2002.
[12] R. Friedman and E. Hadad, FTS: A High-Performance CORBA Fault-Tolerance Service Proc. Seventh Int'l Workshop Object-Oriented Real-Time Dependable Systems, pp. 61-68, Jan. 2002.
[13] E. Gamma, R. Helm, R. Johnson, and J. Vlissides, Design Patterns: Elements of Reusable Object-Oriented Software. Addison-Wesley, 1995.
[14] R. Guerraoui, P. Felber, B. Garbinato, and K. Mazouni, System Support for Object Groups Proc. ACM Conf. Object Oriented Programming Systems, Languages, and Applications (OOPSLA '98), Oct. 1998.
[15] H. Higaki and T. Soneoka, Fault-Tolerant Object by Group-to-Group Communications in Distributed Systems Proc. Second Int'l Workshop Responsive Computer Systems, pp. 62-71, Oct. 1992.
[16] IONA and Isis, An Introduction to Orbix+Isis, IONA Technologies Ltd. and Isis Distributed Systems, Inc., 1994.
[17] S. Maffeis, Run-Time Support for Object-Oriented Distributed Programming PhD thesis, Univ. of Zurich, Feb. 1995.
[18] C. Marchetti, M. Mecella, A. Virgillito, and R. Baldoni, An Interoperable Replication Logic for CORBA Systems Proc. Int'l Symp. Distributed Objects and Applications, pp. 7-16, Sept. 2000.
[19] G. Morgan, S. Shrivastava, P. Ezhilchelvan, and M. Little, Design and Implementation of a CORBA Fault-Tolerant Object Group Service Proc. Second IFIP WG 6.1 Int'l Working Conf. Distributed Applications and Interoperable Systems, June 1999.
[20] L. Moser, P. Melliar-Smith, and P. Narasimhan, Consistent Object Replication in the Eternal System Theory and Practice of Object Systems, vol. 4, no. 2, pp. 81-92, 1998.
[21] L.E. Moser, P.M. Melliar-Smith, D.A. Agarwal, R.K. Budhia, and C.A. Lingley-Papadopoulos, Totem: A Fault-Tolerant Multicast Group Communication System Comm. ACM, vol. 39, no. 4, pp. 54-63, Apr. 1996.
[22] Distributed Systems, S. Mullender, ed., second ed., chapters 7 and 8. Addison-Wesley, 1993.
[23] P. Narasimhan, Transparent Fault Tolerance for CORBA PhD thesis, Dept. of Electrical and Computer Eng., Univ. of California, Santa Barbara, Dec. 1999.
[24] P. Narasimhan and C.F. Reverte, Configuring Replication Properties through the MEAD Fault-Tolerance Advisor Proc. Workshop Object-Oriented Real-Time Dependable Systems, Oct. 2003.
[25] P. Narasimhan, K.P. Kihlstrom, L.E. Moser, and P.M. Melliar-Smith, “Providing Support for Survivable CORBA Applications with the Immune System,” Proc. IEEE Int'l Conf. Distributed Computing Systems, May 1999.
[26] P. Narasimhan, L. Moser, and P.M. Melliar-Smith, Gateways for Accessing Fault Tolerance Domains Proc. Middleware 2000, Apr. 2000.
[27] B. Natarajan, A. Gokhale, S. Yajnik, and D.C. Schmidt, DOORS: Towards High-Performance Fault-Tolerant CORBA Proc. Int'l Symp. Distributed Objects and Applications, pp. 39-48, Sept. 2000.
[28] Object Management Group, The Common Object Services Specification OMG Technical Committee Document formal/98-07-05, July 1998.
[29] Object Management Group, Fault Tolerant CORBA (Final Adopted Specification) OMG Technical Committee Document formal/01-12-29, Dec. 2001.
[30] Object Management Group, Portable Interceptors (Final Adopted Specification) OMG Technical Committee Document formal/01-12-25, Dec. 2001.
[31] Object Management Group, The Common Object Request Broker: Architecture and Specification, 2.6 Edition OMG Technical Committee Document formal/02-01-02, Jan. 2002.
[32] G. Parrington, S. Shrivastava, S. Wheater, and M. Little, The Design and Implementation of Arjuna USENIX Computing Systems J., vol. 8, no. 3, pp. 255-308, Summer 1995.
[33] D. Powell, Delta-4: A Generic Architecture for Dependable Distributed Computing. Springer-Verlag, 1991.
[34] B.S. Sabnis, Proteus: A Software Infrastructure Providing Dependability for CORBA Applications master's thesis, Univ. of Illinois at Urbana-Champaign, 1998.
[35] R. van Renesse, K.P. Birman, M. Hayden, A. Vaysburd, and D. Karr, Building Adaptive Systems Using Ensemble Software Practice and Experience, vol. 28, no. 9, pp. 963-979, July 1998.
[36] R. van Renesse, K.P. Birman, and S. Maffeis, Horus: A Flexible Group Communication System Comm. ACM, vol. 39, no. 4, pp. 76-83, Apr. 1996.
[37] A. Vaysburd and K. Birman, The Maestro Approach to Building Reliable Interoperable Distributed Applications with Multiple Execution Styles Theory and Practice of Object Systems, vol. 4, no. 2, pp. 73-80, 1998.
[38] W. Vogels, R.V. Renesse, and K. Birman, Six Misconceptions about Reliable Distributed Computing Proc. Eighth ACM SIGOPS European Workshop, Sept. 1998.
[39] Y.M. Wang et al., “Checkpointing and Its Applications,” Digest 25th Ann. Int'l Symp. Fault-Tolerant Computing, pp. 22-31, June 1995.

Index Terms:
CORBA, FT-CORBA, fault tolerance, nondeterminism, replication, recovery, OGS, Eternal.
Citation:
Pascal Felber, Priya Narasimhan, "Experiences, Strategies, and Challenges in Building Fault-Tolerant CORBA Systems," IEEE Transactions on Computers, vol. 53, no. 5, pp. 497-511, May 2004, doi:10.1109/TC.2004.1275293
Usage of this product signifies your acceptance of the Terms of Use.