This Article 
 Bibliographic References 
 Add to: 
Design and Evaluation of a Window-Consistent Replication Service
September 1997 (vol. 46 no. 9)
pp. 986-996

Abstract—Real-time applications typically operate under strict timing and dependability constraints. Although traditional data replication protocols provide fault tolerance, real-time guarantees require bounded overhead for managing this redundancy. This paper presents the design and evaluation of a window-consistent primary-backup replication service that provides timely availability of the repository by relaxing the consistency of the replicated data. The service guarantees controlled inconsistency by scheduling update transmissions from the primary to the backup(s); this ensures that client applications interact with a window-consistent repository when a backup must supplant a failed primary. Experiments on our prototype implementation, on a network of Intel-based PCs running RT-Mach, show that the service handles a range of client loads while maintaining bounds on temporal inconsistency.

[1] R. Alonso, D. Barbara, and H. Garcia-Molina, "Data Caching Issues in an Information Retrieval System," ACM Trans. Database Systems, vol. 15, no. 3, pp. 359-384, Sept. 1990.
[2] P.A. Alsberg and J.D. Day,“A principle for resilient sharing of distributed resources,” Proc. Second Int’l Conf. Software Eng., pp. 562-570, Oct. 1976.
[3] C.M. Aras, J.F. Kurose, D.S. Reeves, and H. Schulzrinne, “Real-Time Communication in Packet-Switched Networks,” Proc. IEEE, vol. 82, no. 1, pp. 122-139, Jan. 1994.
[4] J. Bartlett, “A NonStop Kernel,” Proc. ACM Symp.Operating Systems Principles, ACM Press, New York, 1981, pp. 22‐29.
[5] A. Bhide, E.N. Elnozahy, and S.P. Morgan, "A Highly Available Network File Server," Proc. Winter USENIX Conf., pp. 199-205, Jan. 1991.
[6] K. Birman and T. Joseph, "Reliable Communications in Presence of Failures," ACM Trans. Computing Systems, vol. 5, no. 1, pp. 47-76, 1987.
[7] K. Birman, "The Process Group Approach to Reliable Distributed Computing," Comm. ACM, vol. 36, no. 12, pp. 37-53, 1993.
[8] N. Budhiraja and K. Marzullo, "Tradeoffs in Implementing Primary-Backup Protocols," Dept. of Computer Science TR-92-1307, Cornell Univ., 1992.
[9] N. Budhiraja and K. Marzullo, "Tradeoffs in Implementing Primary-Backup Protocols," Proc. IEEE Symp. Parallel and Distributed Processing, pp. 280-288, Oct. 1995.
[10] F. Cristian, "Understanding Fault-Tolerant Distributed Systems," Comm. ACM, vol. 34, no. 2, Feb. 1991.
[11] F. Cristian, B. Dancey, and J. Dehn, “Fault Tolerance in the Advanced Automation System,” Proc. 20th IEEE Int'l Symp. Fault-Tolerant Computing, p. 617, Newcastle, U.K., 1990.
[12] S.B. Davidson and A. Watters, "Partial Computation in Real-Time Database Systems," Proc. Workshop Real-Time Operating Systems and Software, pp. 117-121, May 1988.
[13] C.-C. Han and K.-J. Lin, “Scheduling Distance-Contrained Real-Time Tasks,” Proc. IEEE 13th Real-Time Systems Symp., pp. 300-308, Dec. 1992.
[14] D.D. Kandlur, K.G. Shin, and D. Ferrari, “Real-Time Communication in Multi-Hop Networks,” IEEE Trans. Parallel and Distributed Systems, vol. 5, no. 10, pp. 1,044-1,056, Oct. 1994.
[15] H. Kopetz, A. Damm, C. Koza, M. Mulazzani, W. Schwabi, C. Senft, and R. Zainlinger, "Distributed Fault-Tolerant Real-Time Systems: The MARS Approach," IEEE Micro, pp. 25-58, Feb. 1989.
[16] H. Kopetz and G. Grünsteidl, "TTP: A Time-Triggered Protocol for Fault-Tolerant Real-Time Systems," Computer, vol. 24, no. 1, Jan. 1994, pp. 14-23.
[17] H.F. Korth, N. Soparkar, and A. Silberschatz, “Triggered Real Time Databases with Consistency Constraints,” Proc. 16th VLDB Conf., Aug. 1990.
[18] T.-W. Kuo and A.K. Mok, “SSP: A Semantics-Based Protocol for Real-Time Data Access,” Proc. IEEE 14th Real-Time Systems Symp., Dec. 1993.
[19] J. Lehoczky, L. Sha, and Y. Ding, The Rate Monotonic Scheduling Algorithm: Exact Characterization and Average Case Behavior Proc. IEEE Real-Time Systems Symp., pp. 166-171, 1989.
[20] K.-J. Lin, F. Jahanian, A. Jhingran, and C.D. Locke, "A Model of Hard Real-Time Transaction Systems," Technical Report RC 17515, IBM T.J. Watson Research Center, Jan. 1992.
[21] C.L. Liu and J.W. Layland, “Scheduling Algorithms for Multiprogramming in a Hard Real-Time Environment,” J. ACM, vol. 20, no. 1, pp. 40-61, 1973.
[22] J.W.S. Liu, W. Shih, K.J. Lin, R. Bettati, and J. Chung, “Imprecise Computations,” IEEE Proc., Jan. 1994.
[23] A. Mehra, A. Indiresan, and K.G. Shin, “Structuring Communication for Quality of Service Guarantees,” Proc. IEEE Real-Time Systems Symp., pp. 144-154, Dec. 1996.
[24] A. Mehra, J. Rexford, H. Ang, and F. Jahanian, Design and Evaluation of a Window-Consistent Replication Service Proc. Real-Time Technology and Applications Symp., 1995.
[25] C. Mercer, S. Savage, and H. Tokuda, “Processor Capacity Reserves: Operating System Support for Multimedia Applications,” Proc. IEEE Int'l Conf. Multimedia Computing and Systems, May 1994.
[26] S. Mishra, L.L. Peterson, and R.D. Schlichting, "Consul: A Communication Substrate for Fault-Tolerant Distributed Programs," Technical Report 91-32, Univ. of Arizona, Nov. 1991.
[27] J.-F. Paris, "Using Volatile Witnesses to Extend the Applicability of Available Copy Protocols," Proc. Workshop Management of Replicated Data, pp. 30-33, Nov. 1992.
[28] C. Pu and A. Leff, "Replica Control in Distributed Systems: An Asynchronous Approach," Proc. ACM SIGMOD Int'l Conf. Management Data, pp. 377-386, 1991.
[29] J. Rexford, A. Mehra, J. Dolter, and F. Jahanian, "Window-Consistent Replication for Real-Time Applications," Proc. Workshop Real-Time Operating Systems and Software, pp. 107-111, May 1994.
[30] F.B. Schneider, "Implementing Fault-Tolerant Services Using the State Machine Approach: A Tutorial," ACM Computing Surveys, vol. 22, no. 4, pp. 299-319, Dec. 1990.
[31] G. Swaminathan, C++ Socket Classes. Univ. of Virginia, June 1993.
[32] H. Tokuda, T. Nakajima, and P. Rao, "Real-Time Mach: Toward a Predictable Real-Time System," Proc. USENIX Mach Workshop, pp. 73-82, Oct. 1990.
[33] P. Verissimo, P. Barrett, P. Bond, A. Hilborne, L. Rodrigues, and D. Seaton, "The Extra Performance Architecture (XPA)," Delta-4—A Generic Architecture for Dependable Distributed Computing, D. Powell, ed., 1991.

Index Terms:
Real-time systems, fault tolerance, replication protocols, temporal consistency, scheduling.
Ashish Mehra, Jennifer Rexford, Farnam Jahanian, "Design and Evaluation of a Window-Consistent Replication Service," IEEE Transactions on Computers, vol. 46, no. 9, pp. 986-996, Sept. 1997, doi:10.1109/12.620480
Usage of this product signifies your acceptance of the Terms of Use.