This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
AQuA: An Adaptive Architecture that Provides Dependable Distributed Objects
January 2003 (vol. 52 no. 1)
pp. 31-50

Abstract—Building dependable distributed systems from commercial off-the-shelf components is of growing practical importance. For both cost and production reasons, there is interest in approaches and architectures that facilitate building such systems. The AQuA architecture is one such approach; its goal is to provide adaptive fault tolerance to CORBA applications by replicating objects. The AQuA architecture allows application programmers to request desired levels of dependability during applications' runtimes. It provides fault tolerance mechanisms to ensure that a CORBA client can always obtain reliable services, even if the CORBA server object that provides the desired services suffers from crash failures and value faults. AQuA includes a replicated dependability manager that provides dependability management by configuring the system in response to applications' requests and changes in system resources due to faults. It uses Maestro/Ensemble to provide group communication services. It contains a gateway to intercept standard CORBA IIOP messages to allow any standard CORBA application to use AQuA. It provides different types of replication schemes to forward messages reliably to the remote replicated objects. All of the replication schemes ensure strong data consistency among replicas. This paper describes the AQuA architecture and presents, in detail, the active replication pass-first scheme. In addition, the interface to the dependability manager and the design of the dependability manager replication are also described. Finally, we describe performance measurements that were conducted for the active replication pass-first scheme, and we present results from our study of fault detection, recovery, and blocking times.

[1] R. Buskens, A. Siddlqui, and Y. Ren, “AURORA Management Workbench,” Bell Laboratories,http://www.bell-labs.com/projectaurora, 2002.
[2] S. Bagchi, K. Whisnant, Z. Kalbarczyk, and R.K. Iyer, “Chameleon: A Software Infrastructure for Adaptive Fault Tolerance,” Proc. 17th IEEE Symp. Reliable Distributed Systems, pp. 261-267, Oct. 1998.
[3] N.T. Bhatti, M.A. Hiltunen, R.D. Schlichting, and W. Chiu, “Coyote: A System for Constructing Fine-Grain Configurable Communication Services,” Technical Report TR97-12, Dept. of Computer Science, Univ. of Arizona, July 1997.
[4] K.P. Birman and R. Van Renesse, Reliable Distributed Computing with the Isis Toolkit. IEEE CS Press, 1994.
[5] K.P. Birman, Building Secure and Reliable Network Applications. Greenwich, Conn.: Manning Publications, 1996.
[6] M. Cukier et al., AQuA: An Adaptive Architecture that Provides Dependable Distributed Objects Proc. IEEE Symp. Reliable Distributed Systems, pp. 245-253, Oct. 1998.
[7] P. Ezhilchelvan, R. Macedo, and S. Shrivastava, "Newtop: A Fault-Tolerant Group Communication Protocol," Proc. 15th Int'l Conf. Distributed Computing Systems, IEEE CS Press, Vancouver, BC, Canada, June 1995.
[8] J.-C. Fabre and T. Perennou, “A Metaobject Architecture for Fault-Tolerant Distributed Systems: The FRIENDS Approach,” IEEE Trans. Computers, vol. 47, no. 1, pp. 78-95, Jan. 1998.
[9] P. Felber, B. Garbinato, and R. Guerraoui, “The Design of a CORBA Group Communication Service,” Proc. 15th Symp. Reliable Distributed Systems (SRDS), pp. 150-159, Oct. 1996.
[10] A. Gokhale, B. Natarajan, D.C. Schmidt, and S. Yajnik, “DOORS: Towards High-Performance Fault-Tolerant CORBA,” Proc. Second Int'l Symp. Distributed Objects and Applications (DOA '00), Sept. 2000.
[11] M.G. Hayden, “The Ensemble System,” PhD thesis, Cornell Univ., 1998.
[12] H. Kopetz, A. Damm, C. Koza, M. Mulazzani, W. Schwabi, C. Senft, and R. Zainlinger, "Distributed Fault-Tolerant Real-Time Systems: The MARS Approach," IEEE Micro, pp. 25-58, Feb. 1989.
[13] J.P. Loyall, R.E. Schantz, J.A. Zinky, and D.E. Bakken, “Specifying and Measuring Quality of Service in Distributed Object Systems,” Proc. First Int'l Symp. Object-Oriented Real-Time Distributed Computing (ISORC '98), pp. 43–52, Apr. 1998.
[14] J.P. Loyall, D.E. Bakken, R.E. Schantz, J.A. Zinky, D.A. Karr, R. Vanegas, and K.R. Anderson, “QoS Aspect Languages and Their Runtime Integration,” Proc. Fourth Workshop Languages, Compilers, and Run-Time Systems for Scalable Computers (LCR '98), May 1998.
[15] S. Maffeis, “Run-Time Support for Object-Oriented Distributed Programming,” Ph.D thesis, Univ. of Zurich, 1995.
[16] S. Maffeis, “Piranha: A CORBA Tool for High Availability,” Computer, vol. 30, no. 4, pp. 59-66, 1997.
[17] A.P.A. van Moorsel and S. Yajnik, “Design of a Resource Manager for Fault-Tolerant CORBA,” Proc. Int'l Workshop Reliable Middleware Systems, pp. 1-6, Oct. 1999.
[18] G. Morgan, S.K. Shrivastava, P.D. Ezhilchelvan, and M.C. Little, “Design and Implementation of a CORBA Fault-Tolerant Object Group Service,” Proc. Second IFIP WG 6.1 Int'l Working Conf. Distributed Applications and Interoperable Systems (DAIS '99), June 1999.
[19] L.E. Moser, P.M. Melliar-Smith, D.A. Agarwal, R.K. Budhia, C. Lingley-Papadopoulos, and T.P. Archambault, “The Totem System,” Proc. 25th Ann. Int'l Symp. Fault-Tolerant Computing (FTCS-25), pp. 61–66, June 1995.
[20] L.E. Moser, P.M. Melliar-Smith, and P. Narasimhan, “Consistent Object Replication in the Eternal System,” Theory and Practice of Object Systems, vol. 4, no. 2, pp. 81–92, 1998.
[21] P. Narasimhan, L.E. Moser, and P.M. Melliar-Smith, “Replica Consistency of CORBA Objects in Partitionable Distributed Systems,” Distributed Systems Eng., vol. 4, no. 3, pp. 139-150, Sept. 1997.
[22] P. Narasimhan, K.P. Kihlstrom, L.E. Moser, and P.M. Melliar-Smith, “Providing Support for Survivable CORBA Applications with the Immune System,” Proc. IEEE Int'l Conf. Distributed Computing Systems, May 1999.
[23] P. Narasimhan, L.E. Moser, and P.M. Melliar-Smith, Using Interceptors to Enhance CORBA Computer, vol. 32, no. 7, pp. 62-68, July 1999.
[24] P. Narasimhan, L.E. Moser, and P.M. Melliar-Smith, “Gateway for Accessing Fault Tolerance Domain,” Proc. Middleware 2000: IFIP Int'l Conf. Distributed Systems Platforms and Open Distributed Processing, pp. 88-103, Apr. 2000.
[25] P. Narasimhan, L.E. Moser, and P.M. Melliar-Smith, “State Synchronization and Recovery for Strongly Consistent Replicated CORBA Objects,” Proc. 2001 Int'l Conf. Dependable Systems and Networks, pp. 261-270, 2001.
[26] “Delta-4: A Generic Architecture for Dependable Distributed Computing,” D. Powell, ed., ESPRIT Research Reports, vol. 1,Springer-Verlag, 1991.
[27] M.K. Reiter, “The Rampart Toolkit for Building High-Integrity Services,” Theory and Practice in Distributed Systems, Lecture Notes in Computer Science 938, Springer-Verlag, pp. 99–110, 1995.
[28] Y. Ren, M. Cukier, and W.H. Sanders, “An Adaptive Algorithm for Tolerating Value Faults and Crash Failures,” IEEE Trans. Parallel and Distributed Systems, vol. 12, no. 2, pp. 173-192, Feb. 2001.
[29] Y. Ren, “AQuA: A Framework for Providing Adaptive Fault Tolerance to Distributed Applications,” PhD thesis, Univ. of Illinois at Urbana-Champaign, 2001.
[30] L. Rodrigues and P. Verissimo, “Replicated Object Management Using Group Technology,” Proc. Fourth Workshop Future Trends of Distributed Computing Systems, pp. 54-61, Sept. 1993.
[31] P.G. Rubel, “Passive Replication in the AQuA System,” master's thesis, Univ. of Illi nois, 2000.
[32] C. Sabnis, M. Cukier, J. Ren, P. Rubel, W.H. Sanders, D.E. Bakken, and D.A. Karr, “Proteus: A Flexible Infrastructure to Implement Adaptive Fault Tolerance in AQuA,” Proc. Seventh IFIP Working Conf. Dependable Computing for Critical Applications (DCCA-7), pp. 137–156, Jan. 1999.
[33] R.E. Schantz, J.A. Zinky, D.A. Karr, D.E. Bakken, J. Megquier, and J.P. Loyall, “An Object-Level Gateway Supporting Integrated-Property Quality of Service,” Proc. Second IEEE Int'l Symp. Object-Oriented Real-Time Distributed Computing (ISORC '99), May 1999.
[34] Dept. of Computer Science, Washington Univ., “Real-Time CORBA with TAO (The ACE ORB),” http://www.cs.wustl.edu/~schmidt/TAO.html/ ~schmidtTAO.html, 2002.
[35] A. Vaysburd and K.P. Birman, “The Maestro Approach to Building Reliable Interoperable Distributed Applications with Multiple Execution Styles,” Theory and Practice of Object Systems, vol. 4, no. 2, 1998.
[36] J.A. Zinky, D.E. Bakken, and R.E. Schantz, “Architectural Support for Quality of Service for CORBA Objects,” Theory and Practice of Object Systems, vol. 3, no. 1, pp. 55-73, Apr. 1997.
[37] S. Landis and S. Maffeis, “Building Reliable Distributed Systems with CORBA,” Theory and Practice of Object Systems, vol. 3, no. 1, pp. 31-43, 1997.
[38] M. Little and S. Shrivastava, “Java Transactions for the Internet,” Distributed Systems Eng. J., vol. 5, pp. 156-167, Dec. 1998.
[39] Object Management Group, “The Common Object Request Broker: Architecture and Specification,” OMG Technical Document ptc/99-06-01, June 1999.

Index Terms:
Dependable distributed systems, replication protocols, adaptive fault-tolerant systems, CORBA, group communication.
Citation:
Yansong (Jennifer) Ren, David E. Bakken, Tod Courtney, Michel Cukier, David A. Karr, Paul Rubel, Chetan Sabnis, William H. Sanders, Richard E. Schantz, Mouna Seri, "AQuA: An Adaptive Architecture that Provides Dependable Distributed Objects," IEEE Transactions on Computers, vol. 52, no. 1, pp. 31-50, Jan. 2003, doi:10.1109/TC.2003.1159752
Usage of this product signifies your acceptance of the Terms of Use.