This Article 
 Bibliographic References 
 Add to: 
Replica Determinism and Flexible Scheduling in Hard Real-Time Dependable Systems
February 2000 (vol. 49 no. 2)
pp. 100-111

Abstract—Fault-tolerant real-time systems are typically based on active replication where replicated entities are required to deliver their outputs in an identical order within a given time interval. Distributed scheduling of replicated tasks, however, violates this requirement if on-line scheduling, preemptive scheduling, or scheduling of dissimilar replicated task sets is employed. This problem of inconsistent task outputs has been solved previously by coordinating the decisions of the local schedulers such that replicated tasks are executed in an identical order. Global coordination results either in an extremely high communication effort to agree on each schedule decision or in an overly restrictive execution model where on-line scheduling, arbitrary preemptions, and nonidentically replicated task sets are not allowed. To overcome these restrictions, a new method, called timed messages, is introduced. Timed messages guarantee deterministic operation by presenting consistent message versions to the replicated tasks. This approach is based on simulated common knowledge and a sparse time base. Timed messages are very effective since they neither require communication between the local scheduler nor do they restrict usage of on-line flexible scheduling, preemptions and nonidentically replicated task sets.

[1] N.C. Audsley, A. Burns, M. Richardson, K. Tindell, and A. Wellings, "Applying New Scheduling Theory to Static Priority Preemptive Scheduling," Software Eng. J. vol. 8, no. 5, pp. 284-292, Sept. 1993.
[2] I.J. Bate, A. Burns, and N.C. Audsley, “Putting Fixed Priority Scheduling Theory into Engineering Practice,” Proc. Second IEEE Real-Time Applications Symp., 1996.
[3] P.A. Barrett, A. Burns, and A.J. Wellings, “Models of Replication for Safety Critical Hard Real-Time Systems,” Proc. 20th IFAC/IFIP Workshop Real-Time Programming (WRTP '95), 1995.
[4] A. Burns,”, “Preemptive priority based scheduling: An appropriateengineering approach,” in Advances in Real-Time Systems, S.H. Son, ed. Prentice Hall, pp. 225-248, 1993.
[5] A. Burns, K. Tindell, and A.J. Wellings, “Fixed Priority Scheduling with Deadlines Prior to Completion,” Proc. Sixth Euromicro Workshop Real-Time Systems, pp. 138-142, 1994.
[6] German patent application DE 35 06 118 A1. Verfahren zum Betreiben einer Datenverarbeitungsanlage für Kraftfahrzeuge. (CAN), filed by Robert Bosch GmbH, 22 Feb. 985.
[7] F. Cristian, H. Aghili, R. Strong, and D. Dolev, “Atomic Broadcast: From Simple Message Diffusion to Byzantine Agreement,” Proc. 15th Int'l Symp. Fault-Tolerant Computing (FTCS-15), pp. 200-206, June 1985.
[8] F. Cristian, “Synchronous Atomic Broadcast for Redundant Broadcast Channels,” J. Real-Time Systems, vol. 2,no.. 3, pp. 195-212, Sept. 1990.
[9] J. Halpern and Y. Moses, “Knowledge and Common Knowledge in a Distributed Environment,” Proc. ACM Symp. Principles of Distributed Computing, pp. 50-61, Aug. 1984 (revised version dated Nov. 1985 ).
[10] J. Halpern and Y. Moses, “Knowledge and Common Knowledge in a Distributed Environment,” J. ACM, vol. 37,no.. 3, pp. 549-587, July 1990.
[11] H. Kopetz, A. Damm, C. Koza, M. Mulazzani, W. Schwabi, C. Senft, and R. Zainlinger, "Distributed Fault-Tolerant Real-Time Systems: The MARS Approach," IEEE Micro, pp. 25-58, Feb. 1989.
[12] M. H. Klein, T. Ralya, B. Pollak, R. Obenza, and M. G. Harobur,A Practitioner's Handbook for Real-Time Analysis: Guide to Rate Monotonic Analysis for Real-Time Systems. New York: Kluwer–Academic, 1993.
[13] H. Kopetz, A. Krüger, D. Millinger, and A. Schedl, “A Synchronization Strategy for a Time-Triggered Multicluster Real-Time System,” Proc. 14th Symp. Reliable Distributed Systems, Sept. 1995.
[14] H. Kopetz and W. Ochsenreiter, “Clock Synchronization in Distributed Real-Time Systems,” IEEE Trans. Computers, vol. 36, no. 8, pp. 933–940, Aug. 1987
[15] H. Kopetz, “Sparse Time versus Dense Time in Distributed Real-Time Systems,” Proc. 12th Int'l Conf.n Distributed Computing Systems, pp. 460-467, June 1992.
[16] R.M. Kieckhafer,C.J. Walter,A.M. Finn, and P.M. Thambidurai,"The MAFT Architecture for Distributed Fault-Tolerance," IEEE Trans. Computers, vol. 37, no. 4, pp. 398-405, Apr. 1988.
[17] L. Lamport and P.M. Melliar-Smith, “Synchronizing Clocks in the Presence of Faults,” J. ACM, vol. 32, no. 1, pp. 52–78, Jan. 1985.
[18] C.D. Locke, Softwre Architecture for Hard Real-time Applications: Cyclic Executives vs. Fixed Priority Executives J. Real-Time Systems, vol. 4, pp. 37-53, 1992.
[19] L. Lamport, R. Shostak, and M. Pease, "The Byzantine Generals Problem," ACM Trans. Programming Languages and Systems, vol. 4, no. 3, July 1982, pp. 382-401.
[20] H.-J. Mathony and S. Poledna, “Real-Time Software for In-Vehicle Communication,” Proc. SAE Int'l Congress and Exposition, pp. 1-9, Feb. 1996.
[21] G. Neiger, “Knowledge Consistency: A Useful Suspension of Disbelief,” Proc. Second Conf. Theoretical Aspects of Reasoning about Knowledge, pp. 295-308, 1988.
[22] G. Neiger and S. Toueg, “Substituting for Real Time and Common Knowledge in Asynchronous Distributed Systems,” Proc. Sixth ACM Symp. Principles of Distributed Computing, pp. 281-293, Aug. 1987.
[23] G. Neiger and S. Toueg, “Simulating Synchronized Clocks and Common Knowledge in Distributed Systems,” J. ACM, vol. 40, no. 2, pp. 334–67, Apr. 1993.
[24] S. Poledna, T. Mocken, J. Schiemann, and T. Beck, “ERCOS—An Operating System for Automotive Applications,” Proc. SAE Int'l Congress and Exposition, pp. 55-65, 1996.
[25] S. Poledna, “Replica Determinism in Fault-Tolerant Real-Time Systems,” PhD thesis, Technical Univ. of Vienna, Institut für Technische Informatik, 1994.
[26] S. Poledna, “Deterministic Operation in Fault-Tolerant Distributed Real-Time Systems,” Research Report 28/95, Technical Univ. of Vienna, Institut für Technische Informatik, 1995, Proc. Sixth IFIP Int'l Working Conf. Dependable Computing for Critical Applications, 1997.
[27] S. Poledna, “Fault-Tolerance in Safety Critical Automotive Applications: Cost of Agreement as a Limiting Factor,” Proc. 25th Int'l Symp. Fault-Tolerant Computing, pp. 73-82, June 1995.
[28] S. Poledna, Fault Tolerant Real-Time Systems: The Problem of Replica Determinism. Kluwer Academic, 1996.
[29] “Replicated Software Components,” Delta-4: A Generic Architecture for Dependable Computing, D. Powell, ed., vol. 1 of ESPRIT Research Reports, chapter 6.4, pp. 100-104, Vienna, New York: Springer Verlag, 1991.
[30] “Semi-Active Replication,” Delta-4: A Generic Architecture for Dependable Computing, D. Powell, ed., vol. 1 of ESPRIT Research Reports, chapter 6.7, pp. 116-120, Vienna, New York: Springer Verlag, 1991.
[31] M.K. Reiter and K.P. Birman, "How to Securely Replicate Services," ACM Trans. Programming Language Systems, vol. 16, no. 3, pp. 986-1,009, 1994.
[32] F.B. Schneider, "Implementing Fault-Tolerant Services Using the State Machine Approach: A Tutorial," ACM Computing Surveys, vol. 22, no. 4, pp. 299-319, Dec. 1990.
[33] Siemens Microcomputer Components SAB 80C167 16-Bit CMOS Single-Chip Microcontrollers for Embedded Control Applications, User's Manual, 1993.
[34] P. Veríssimo, “Causal Delivery Protocols in Real-Time Systems: A Generic Model,” J. Real-Time Systems, vol. 10, no. 1, pp. 45-73, Jan. 1996.
[35] P. Verissimo, L. Rodrigues, and M. Baptista, “Amp: A Highly Parallel Atomic Multicast Protocol,” SIGCOMM '89, pp. 83–93, Austin, Tex., Sept. 1989.

Index Terms:
Distributed real-time systems, fault tolerance, distributed operating systems, replica determinism, distributed scheduling, flexible scheduling.
Stefan Poledna, Alan Burns, Andy Wellings, Peter Barrett, "Replica Determinism and Flexible Scheduling in Hard Real-Time Dependable Systems," IEEE Transactions on Computers, vol. 49, no. 2, pp. 100-111, Feb. 2000, doi:10.1109/12.833107
Usage of this product signifies your acceptance of the Terms of Use.