Issue No. 01 - January/February (1999 vol. 11)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/69.755632
<p><b>Abstract</b>—Large-scale information systems emerging in challenging application fields must meet the high standards of reliability, maintainability, and service interruption bound requirements. Their operations are entirely, or partially, of the distributed real-time data object manipulation type. A new architecture for such systems is presented in this paper. The original aspects of the architecture are mainly in two parts: 1) the time-triggered message-triggered object (TMO) structuring of the middleware and the application software of distributed real-time information systems; and 2) the dynamic configuration management subsystem (DCMS), based on the supervisor-based network surveillance (SNS) scheme. The positive impacts of this TMO structuring on maintainability and service interruption bounds are first discussed, with distributed replicated information service systems and other systems as examples. Then, the main discussion dwells on the DCMS architecture—in particular, formal presentation of its key component: the SNS scheme. As a component of DCMS, the network surveillance (NS) subsystem enables fast learning by each interested fault-free node in the system of the faults or repair completion events occurring in other parts of the system. Currently, concrete real-time NS schemes effective in distributed systems based on point-to-point network architectures are scarce. The SNS scheme presented in this paper is a semicentralized real-time NS scheme effective in a variety of point-to-point networks. This scheme is highly scalable. An efficient implementation model for the SNS scheme is presented that can be easily adapted to various commercial operating system kernels. This paper also presents a formal analysis of the SNS scheme, on the basis of the implementation model, to obtain its strongly competitive tight bounds on the fault detection latency. Finally, some DCMS implementation issues are discussed that remain to be addressed in future research.</p>
Object, distributed computing, information service systems, real time, TMO, time-triggered, message-triggered, configuration management, network surveillance, point-to-point networks, fault detection latency, latency bound, supervisor.
K.H. (Kane) Kim, Chittur Subbaraman, "Dynamic Configuration Management in Reliable Distributed Real-Time Information Systems", IEEE Transactions on Knowledge & Data Engineering, vol. 11, no. , pp. 239-254, January/February 1999, doi:10.1109/69.755632