This Article 
 Bibliographic References 
 Add to: 
Dynamic Configuration Management in Reliable Distributed Real-Time Information Systems
January/February 1999 (vol. 11 no. 1)
pp. 239-254

Abstract—Large-scale information systems emerging in challenging application fields must meet the high standards of reliability, maintainability, and service interruption bound requirements. Their operations are entirely, or partially, of the distributed real-time data object manipulation type. A new architecture for such systems is presented in this paper. The original aspects of the architecture are mainly in two parts: 1) the time-triggered message-triggered object (TMO) structuring of the middleware and the application software of distributed real-time information systems; and 2) the dynamic configuration management subsystem (DCMS), based on the supervisor-based network surveillance (SNS) scheme. The positive impacts of this TMO structuring on maintainability and service interruption bounds are first discussed, with distributed replicated information service systems and other systems as examples. Then, the main discussion dwells on the DCMS architecture—in particular, formal presentation of its key component: the SNS scheme. As a component of DCMS, the network surveillance (NS) subsystem enables fast learning by each interested fault-free node in the system of the faults or repair completion events occurring in other parts of the system. Currently, concrete real-time NS schemes effective in distributed systems based on point-to-point network architectures are scarce. The SNS scheme presented in this paper is a semicentralized real-time NS scheme effective in a variety of point-to-point networks. This scheme is highly scalable. An efficient implementation model for the SNS scheme is presented that can be easily adapted to various commercial operating system kernels. This paper also presents a formal analysis of the SNS scheme, on the basis of the implementation model, to obtain its strongly competitive tight bounds on the fault detection latency. Finally, some DCMS implementation issues are discussed that remain to be addressed in future research.

[1] A. Attoui and M. Schneider, “An Object-Oriented Model for Parallel and Reactive Systems,” Proc. Real-Time Systems Symp., pp. 84–93, Dec. 1991.
[2] F.B. Bastani, B. Cukic, V. Hilford, and A. Jamoussi, Toward Dependable Safety-Critical Software Proc. Second Workshop Object-Oriented Real-Time Dependable Systems, Feb. 1996.
[3] F. Cristian, "Agreeing on Who is Present and Who is Absent in a Synchronous Distributed System," Proc. 18th IEEE Computer Soc. Int'l Symp. Fault-Tolerant Computing, pp. 206-211, June 1988.
[4] M.A. Ellis and B. Stroustrup, The Annotated C++ Reference Manual, Addison-Wesley, Reading, Mass., 1995.
[5] P.D. Ezhilchelvan and R. de Lemos, “A Robust Group Membership Algorithm for Distributed Real-Time Systems,” Proc. Real-Time Systems Symp., 1990.
[6] M. Hecht et al., “A Distributed Fault Tolerant Architecture for Nuclear Reactor and Other Critical Process Control Applications,” Proc. IEEE CS 21st Int'l Symp. Fault-Tolerant Computing, pp. 462-469, June 1991.
[7] Y. Ishikawa, H. Tokuda, and C.W. Mercer, “An Object-Oriented Real-Time Programming Language,” Computer, vol. 25, no. 10, pp. 66–73, Oct. 1992.
[8] W. Jia, J. Kaiser, and E. Nett, "Reliable Multicast Protocol for Fault Tolerant Group Communication," IEEE Micro, vol. 16, no. 2, pp. 59-67, Apr. 1996.
[9] J.L. Kim and G.G. Belford, "A Robust Distributed Election Protocol," Proc. Seventh IEEE Computer Soc. Symp. Reliable Distributed Systems, pp. 54-60,Columbus, Ohio, Oct. 1988.
[10] K.H. Kim, “Action-Level Fault Tolerance,” Advances in Real-Time Systems, S.H. Son, ed., chapter 17, pp. 415-434, Prentice Hall, 1994.
[11] K.H. Kim et al., "Distinguishing Features and Potential Roles of the RTO.k Object Model," Proc. 1994 Workshop on Object-Oriented Real-Time Dependable Systems, IEEE CS Press, Los Alamitos, Calif., 1994, pp. 36-45.
[12] K.H. Kim et al., "A Timeliness-Guaranteed Kernel Model—DREAM Kernel and Implementation Techniques," Proc. Int'l Workshop Real-Time Computing Systems and Applications, pp. 80-87,Tokyo, Oct. 1995.
[13] K.H. Kim and C. Subbaraman, "PSRR: A Scheme for Time-Bounded Fault Tolerance in Distributed Object-Based Systems," Proc. IEEE High-Assurance Systems Eng. Workshop, pp. 120-128,Niagra on the Lake, Ont., Canada, Oct. 1996.
[14] K. Kim, “Object Structures for Real-Time Systems and Simulators,” IEEE Computer, vol 30, no. 8, pp. 62-70, Aug. 1997.
[15] H. Kopetz and G. Grunsteidl, “TTP—A Time-Triggered Protocol for Fault-Tolerant Real-Time Systems,” Proc. IEEE Int'l Symp. Fault-Tolerant Computing, pp. 524–533, 1993.
[16] A.C. Liang, S. Bhattacharya, W.T. Tsai, "Fault-Tolerant Multicast on Hypercube," J. Parallel and Distributed Computing, Vol. 23, No. 12, Dec. 1994, pp. 418-428.
[17] H. Lonn and R. Snedsbol, "Efficient Synchronization, Atomic Broadcast, and Membership Agreement in TDMA Protocol," Proc. Int'l Conf. Parallel and Distributed Systems, pp. 405-412,France, Sept. 1996.
[18] L.E. Moser, P.M. Melliar-Smith, D.A. Agarwal, R.K. Budhia, C. Lingley-Papadopoulos, and T.P. Archambault, “The Totem System,” Proc. 25th Ann. Int'l Symp. Fault-Tolerant Computing (FTCS-25), pp. 61–66, June 1995.
[19] L. Rodriguez, P. Verissimo, and J. Rufino, "A Low-Level Processor Group Membership Protocol for LANs," Proc. 13th IEEE Computer Soc. Int'l Conf. Distributed Computing Systems, pp. 541-550, May 1993.
[20] S.H. Son, Advances in Real-Time Systems, Prentice Hall, N.J., 1994.
[21] K. Takashio and M. Tokoro, "DROL: An Object-Oriented Programming Language for Distributed Real-Time Systems," Proc. OOPSLA, pp. 276-294, ACM, 1992.
[22] I.-L. Yen, "An Object-Oriented Fault Tolerance Framework Based on Specialization Techniques," Proc. Third IEEE Computer Soc. Workshop Object-Oriented Real-Time Dependable Systems, pp. 291-297,Newport Beach, Calif., Feb. 1997.

Index Terms:
Object, distributed computing, information service systems, real time, TMO, time-triggered, message-triggered, configuration management, network surveillance, point-to-point networks, fault detection latency, latency bound, supervisor.
K.H. (Kane) Kim, Chittur Subbaraman, "Dynamic Configuration Management in Reliable Distributed Real-Time Information Systems," IEEE Transactions on Knowledge and Data Engineering, vol. 11, no. 1, pp. 239-254, Jan.-Feb. 1999, doi:10.1109/69.755632
Usage of this product signifies your acceptance of the Terms of Use.