This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
GUARDS: A Generic Upgradable Architecture for Real-Time Dependable Systems
June 1999 (vol. 10 no. 6)
pp. 580-599

Abstract—The development and validation of fault-tolerant computers for critical real-time applications are currently both costly and time consuming. Often, the underlying technology is out-of-date by the time the computers are ready for deployment. Obsolescence can become a chronic problem when the systems in which they are embedded have lifetimes of several decades. This paper gives an overview of the work carried out in a project that is tackling the issues of cost and rapid obsolescence by defining a generic fault-tolerant computer architecture based essentially on commercial off-the-shelf (COTS) components (both processor hardware boards and real-time operating systems). The architecture uses a limited number of specific, but generic, hardware and software components to implement an architecture that can be configured along three dimensions: redundant channels, redundant lanes, and integrity levels. The two dimensions of physical redundancy allow the definition of a wide variety of instances with different fault tolerance strategies. The integrity level dimension allows application components of different levels of criticality to coexist in the same instance. The paper describes the main concepts of the architecture, the supporting environments for development and validation, and the prototypes currently being implemented.

[1] M. Abbott et al., "Durable Memory RS/6000 System Design," Proc. 24th Ann. Int'l Symp. Fault-Tolerant Computing, IEEE CS Press, Los Alamitos, Calif., 1994, pp. 414-423.
[2] P.A. Lee and T. Anderson, Fault Tolerance: Principles and Practice, second ed. Vienna, Austria: Springer–Verlag, 1990.
[3] J. Arlat, “Preliminary Definition of the GUARDS Validation Strategy,” Research Report no. 96378, LAAS-CNRS, Toulouse, France, Jan. 1997. Also available as ESPRIT Project 20716 GUARDS Report no. D3A1.A0.5002.C.
[4] J. Arlat et al., "Fault Injection for Dependability Validation: A Methodology and Some Applications," IEEE Trans. Software Eng., Feb. 1990, pp. 166-182.
[5] J. Arlat, A. Costes, Y. Crouzet, J.-C. Laprie, and D. Powell, Fault Injection and Dependability Evaluation of Fault-Tolerant Systems IEEE Trans. Computers, vol. 42, no. 8, pp. 913-923, Aug. 1993.
[6] N. Audsley, “Flexible Scheduling for Hard Real-Time Systems,” DPhil thesis, Dept. of Computer Science, Univ. of York, UK, 1993.
[7] N.C. Audsley, A. Burns, M. Richardson, K. Tindell, and A. Wellings, "Applying New Scheduling Theory to Static Priority Preemptive Scheduling," Software Eng. J. vol. 8, no. 5, pp. 284-292, Sept. 1993.
[8] N. Audsley, K. Tindell, and A. Burns, “The End of the Line for Static Cyclic Scheduling?” Proc. Fifth Euromicro Workshop Real-Time Systems, pp. 36-41, Oulu, Finland, 1993.
[9] P. Barrett, A. Burns, and A.J. Wellings, “Models of Replication for Safety Critical Hard Real-Time Systems,” Proc. 20th IFAC/IFIP Workshop Real-Time Programming (WRTP '95), pp. 181-188, Ft. Lauderdale, Fla., Nov. 1995.
[10] I. Bates and A. Burns, “Schedulability Analysis of Fixed Priority Real-Time Systems with Offsets,” Proc. Ninth Euromicro Workshop on Real-Time Systems, pp. 153-160, Toledo, Spain, 1997.
[11] C. Béounes, M. Aguéra, J. Arlat, S. Bachman, C. Bourdeau, J.E. Doucet, K. Kanoun, J.-C. Laprie, S. Metge, J. Moreira de Souza, D. Powell, and P. Spiesser, “SURF2: A Program for Dependability Evaluation of Complex Hardware and Software Systems” Proc. 23rd IEEE Int'l Symp. Fault-Tolerant Computing, pp. 668-673, Toulouse, France, 1993.
[12] C. Bernardeschi, A. Fantechi, S. Gnesi, and A. Santone, “Formal Specification and Verification of the Inter-Channel Consistency Network,” PDCC, Pisa, Italy, ESPRIT Project 20716 GUARDS Report no. I3A4.AO.6009.B, Apr. 1998.
[13] C. Bernardeschi, A. Fantechi, S. Gnesi, and A. Santone, “Formal Specification and Verification of the Inter-Channel Fault Treatment Mechanism,” PDCC, Pisa, Italy, ESPRIT Project 20716 GUARDS Report no. I3A4.AO.6013.A, May 1998.
[14] C. Bernardeschi, A. Fantechi, S. Gnesi, and A. Santone, “Formal Validation of Fault Tolerance Mechanisms,” Digest of FastAbstracts—28th Fault-Tolerant Computing Symp. (FTCS-28), pp. 66-67, Munich, Germany, 23-25 June 1998.
[15] K.J. Biba, “Integrity Considerations for Secure Computer Systems,” The Mitre Corporation, Technical Report no. MTR-3153, Rev. 1, Apr. 1977.
[16] D.M. Blough, G.F. Sullivan, and G.M. Masson, "Intermittent Fault Diagnosis in Multiprocessor Systems," IEEE Trans. Computers, vol. 41, pp. 1,430-1,441, 1992.
[17] A. Bondavalli, S. Chiaradonna, F. Di Giandomenico, and F. Grandoni, “Discriminating Fault Rate and Persistency to Improve Fault Treatment,” Proc. 27th IEEE FTCS—Int'l Symp. Fault-Tolerant Computing, pp. 354-362, 1997.
[18] A. Bondavalli, S. Chiaradonna, F.D. Giandomenico, and F. Grandoni, “Inter-Channel State Restoration,” PDCC, Pisa, technical note, Nov. 1997. Also available as ESPRIT Project 20716 GUARDS Report no. I1-SA4.TN.6006.B.
[19] A. Bondavalli, F.D. Giandomenico, F. Grandoni, D. Powell, and C. Rabéjac, “State Restoration in a COTS-based N-Modular Architecture,” Proc. First Int'l Symp. Object-Oriented Real-Time Distributed Computing (ISORC'98), pp. 174-183, Kyoto, Japan, 20-22 Apr. 1998.
[20] A. Bondavalli, I. Mura, and M. Nelli, “Analytical Modelling and Evaluation of Phased-Mission Systems for Space Applications,” Proc. IEEE High Assurance System Eng. Workshop (HASE '97) pp. 85-91, 1997.
[21] A. Bouali, S. Gnesi, and S. Larosa, “The Integration Project for the JACK Environment,” Bulletin of the EATCS, vol. 54, pp.207-223, Oct. 1994. See alsohttp://rep1.iei.pi.cnr.it/projectsJACK.
[22] A. Burns, N. Hayes, and M.F. Richardson, “Generating Feasible Cyclic Schedules,” Control Eng. Practice, vol. 3, no. 2, pp. 151-162, 1995.
[23] A. Burns and A. Wellings, “HRT-HOOD: A Structured Design Method for Hard Real-Time Ada Systems,” Real-Time Safety Critical Systems, vol. 3, p. 313. Elsevier 1995.
[24] J. Carreira, H. Madeira, and J.G. Silva, Xception: A Technique for the Experimental Evaluation of Dependability in Modern Computers IEEE Trans. Software Eng., vol. 24, no. 2, pp. 125-136, Feb. 1998.
[25] Y. Dutuit, E. Châtelet, J.-P. Signoret, and P. Thomas, “Dependability Modelling and Evaluation by Using Stochastic Petri Nets: Application to Two Test Cases,” Reliability Eng.&System Safety, vol. 55, pp. 117-124, 1997.
[26] A. Fantechi, S. Gnesi, F. Mazzanti, R. Pugliese, and E. Tronci, “A Symbolic Model Checker for ACTL,” PDCC, Pisa, Italy, ESPRIT Project 20716 GUARDS Report no. I3A5.AO.6011.A, Apr. 1998.
[27] A. Fantechi, S. Gnesi, and L. Semini, “Formal Description and Validation for an Integrity Policy Supporing Multiple Levels of Criticality,” Proc. Seventh IFIP Working Conf. Dependable Computing for Critical Applications (DCCA-7), pp. 6-8, San Jose, Calif., Jan. 1999.
[28] L. Gong, P. Lincoln, and J. Rushby, “Byzantine Agreement with Authentication: Observations and Applications in Tolerating Hybrid and Link Faults,” Dependable Computing for Critical Applications 6, Dependable Computing and Fault-Tolerant Systems, R.K. Iyer, M. Morganti, W.K. Fuchs and V. Gligor, eds., vol. 10, pp. 139-157, 1998.
[29] J. Gray, “Why Do Computers Stop and What Can Be Done about It?” Proc. Fifth Symp. Reliability in Distributed Software and Database Systems, pp. 3-12, Los Angeles, Jan. 1986.
[30] R.E. Harper and J.H. Lala, “Fault-Tolerant Parallel Processor,” J. Guidance, Control and Dynamics, vol. 14, no. 3, pp.554-563, May-June 1990.
[31] HOOD Reference Manual, Release 3.1.1, HOOD Technical Group 1992.
[32] M. Hsueh, T. Tsai, and R. Iyer, “Fault Injection Techniques and Tools,” Computer, pp. 75–82, Apr. 1997.
[33] “Functional Safety: Safety-Related Systems,” Draft Int'l Standard IEC 1508, Int'l Electrotechnical Commission, IEC Document no. 65A/179/CDV, Geneva, June 1995.
[34] “HRT-HoodNICE: a Hard Real-Time Software Design Support Tool,” Intecs Sistemi, Pisa, Italy, ESTEC Contract 11234/NL/FM(SC), Final Report, 1996.
[35] E. Jenn, “Modelling for Evaluation,” Technicatome, Aix en Provence, France, ESPRIT Project 20716 GUARDS Report no. I3A3.TN.0056.A Jan. 1998.
[36] K. Kanoun, M. Borrel, T. Moreteveille, and A. Peytavin, “Modeling the Dependability of CAUTRA, a Subset of the French Air Traffic Control System,” Proc. 26th IEEE Int'l Symp. Fault-Tolerant Computing (FTCS26), pp. 106-115, Sendai, Japan, 1996. Reduced version of LAAS-report 95515.
[37] R.M. Kieckhafer,C.J. Walter,A.M. Finn, and P.M. Thambidurai,"The MAFT Architecture for Distributed Fault-Tolerance," IEEE Trans. Computers, vol. 37, no. 4, pp. 398-405, Apr. 1988.
[38] N. Krishnamurthy, V. Jhaveri, and J.A. Abraham, “A Design Methodology for Software Fault Injection in Embedded Systems,” Proc. IFIP Int'l Workshop Dependable Computing and Its Applications (DCIA-98), Y. Chen, ed., pp. 237-248, Johannesburg, South Africa, Jan. 1998.
[39] J.H. Lala and R.E. Harper, "Architectural Principles for Safety-Critical Real-Time Applications," Proc. IEEE, vol. 82, no. 1, pp. 25-40, Jan. 1994.
[40] L. Lamport and P.M. Melliar-Smith, “Synchronizing Clocks in the Presence of Faults,” J. ACM, vol. 32, no. 1, pp. 52–78, Jan. 1985.
[41] “Dependability: Basic Concepts and Terminology,” Dependable Computing and Fault-Tolerance, J.-C. Laprie, ed., vol. 5, p. 265. Vienna: Springer-Verlag, 1992.
[42] J.-C. Laprie, J. Arlat, J.-P. Blanquart, A. Costes, Y. Crouzet, Y. Deswarte, J.-C. Fabre, H. Guillermain, M. Kaâniche, K. Kanoun, C. Mazet, D. Powell, C. Rabéjac, and P. Thévenod, Dependability Handbook, p. 324, Toulouse, France: Cépaduès-Editions, 1995. (in French; English version in preparation).
[43] S. Lee and K.G. Shin, “Optimal Multiple Syndrome Probabilistic Diagnosis,” Proc. 20th Int'l Symp. Fault-Tolerant Computing Systems (FTCS-20), pp. 324-31, Newcastle upon Tyne, U.K., 1990.
[44] J. Leung and J. Whitehead, “On the Complexity of Fixed-Priority Scheduling of Periodic, Real-Time Tasks,” Performance Evaluation, vol. 2, no. 4, pp. 237-250, 1982.
[45] P. Lincoln and J. Rushby, “A Formally Verified Algorithm for Interactive Consistency Under a Hybrid Fault Model,” Proc. Fault Tolerant Computing Symp. 23, pp. 402–411, Toulouse, France, June 1993.
[46] J. Lundelius Welch and N. Lynch, “A New Fault-Tolerant Algorithm for Clock Synchronization,” Information Computing 77, pp. 1-36, 1988.
[47] P.M. Melliar-Smith and R.L. Schwartz, “Formal Specification and Mechanical Verification of SIFT: A Fault-Tolerant Flight Control System,” IEEE Trans. Computers, vol. 31, no. 7, pp. 616-630, July 1982.
[48] S. Owre, S. Rajan, J.M. Rushby, N. Shankar, and M. Srivas, "PVS: Combining Specification, Proof Checking, and Model Checking," Alur and Henzinger [4], pp. 411-414.
[49] A. Paganone and P. Coppola, “Specification and Preliminary Design of the Architectural Development Environment,” Intecs Sistemi, Pisa, Italy, ESPRIT Project 20716 GUARDS Report no. D2A1.A0.3002.C, Apr. 1997.
[50] M. Pease, R. Shostak, and L. Lamport, “Reaching Agreement in the Presence of Faults,” J. ACM, vol. 27, no. 2, pp. 228–234, Apr. 1980.
[51] S. Poledna, “Deterministic Operation in of Dissimilar Replicated Tasks Sets in Fault-Tolerant Distributed Real-Time Systems,” Dependable Computing for Critical Applications 6, M. Dal Cin, C. Meadows, and W.H. Sanders, eds., pp. 103-119, 1998.
[52] D. Powell, Distributed Fault-Tolerance Lessons from Delta-4 IEEE Micro, vol. 14, no. 1, pp. 36-47, Feb. 1994.
[53] D. Powell, “Preliminary Definition of the GUARDS Architecture,” LAAS-CNRS, Toulouse, France, Research Report no. 96277, Jan. 1997. Also available as ESPRIT Project 20716 GUARDS Report no. D1A1.A0.5000.D.
[54] D. Powell, J. Arlat, and K. Kanoun, “Generic Architecture Instantiation Guidelines,” LAAS-CNRS, Toulouse, France, Research Report no. 98136, May 1998. Also available as ESPRIT Project 20716 GUARDS Report no. I1SA1.TN.5008.C.
[55] D. Powell, C. Rabéjac, and A. Bondavalli, “Alpha-Count Mechanism and Inter-Channel Diagnosis,” ESPRIT Project 20716 GUARDS Report no. I1SA1.TN.5009.E, 1998.
[56] C. Rabéjac, “Inter-Channel Fault Treatment Mechanism,” Matra Marconi Space, France, Guards Report no. D1A3 AO 2014 B, Mar. 1997.
[57] P. Ramanathan, K.G. Shin, and R.W. Butler, “Fault-Tolerant Clock Synchronization in Distributed Systems,” Computer, vol. 23, no. 10, Oct. 1990.
[58] J.C. Ruiz Garcia, M.-O. Killijian, J.-C. Fabre, and S. Chiba, “Optimized Object State Checkpointing Using Compile-Time Reflection,” Workshop Embedded Fault-Tolerant Systems, pp. 46-48, Boston, 1998.
[59] E. Omiecinski and E. Lin,“Hash-based and index-based join algorithms for cube and ring connected multicomputers,”IEEE Trans. Knowl. Data Eng., vol. 1, no. 3, pp. 329–342, Sept. 1989.
[60] D. Schwier and F. von Henke, “Mechanical Verification of Clock Synchronization Algorithms,” Design for Validation, ESPRIT Long Term Research Project 20072: DeVa - Second Year Report, LAAS-CNRS, Toulouse, France, pp. 287-303, 1997.
[61] L. Semini, “Formal Specification and Verification for an Integrity Policy Supporting Multiple Levels of Criticality,” PDCC, Pisa, Italy, ESPRIT Project 20716 GUARDS Report, no. I3A5.AO.6012.A, Apr. 1998.
[62] H. Simpson, “Four-Slot Fully Asynchronous Communication Mechanism,” IEE Proc, vol. 137,no. Py. E 1, pp. 17-30, Jan. 1990.
[63] T. K. Srikanth and S. Toueg,“Optimal clock synchronization,”J. ACM, pp. 626–645, July 1987.
[64] P.M. Thambidurai and Y.K. Park,"Interactive Consistency with Multiple Failure Modes," Proc. seventh Reliable Dist. Systems Symp., Oct. 1988.
[65] K. Tindell, “Fixed Priority Scheduling of Hard Real-Time Systems,” DPhil thesis, Dept. of Computer Science, Univ. of York, U.K., 1993.
[66] E. Totel, L. Beus-Dukic, J.-P. Blanquart, Y. Deswarte, D. Powell, and A. Wellings, “Integrity Management in GUARDS,” Proc. IFIP Int'l Conf. Distributed Systems Platforms and Open Distributed Processing (Middleware '98), pp. 105-122, The Lake District, England, 15-18 Sept. 1998.
[67] E. Totel, J.-P. Blanquart, Y. Deswarte, and D. Powell, “Supporting Multiple Levels of Criticality,” Digest 28th Ann. Int'l Symp. Fault-Tolerant Computing, pp. 70-79, June 1998.
[68] A. Wellings and L. Beus-Dukic, “Guidelines for Mapping HRT-HOOD to POSIX/C,” Univ. of York, U.K., ESPRIT Project 20716 GUARDS Report no. I2A1-2.A0.7041.B, Dec. 1997.
[69] A. Wellings, L. Beus-Dukic, and D. Powell, “Real-Time Scheduling in a Generic Fault-Tolerant Architecture,” Proc. 19th Real-Time Systems Symp. (RTSS-19), pp. 390-398, Madrid, 2-4 Dec. 1998.

Index Terms:
Computer architecture, generic architecture, embedded systems, fault tolerance, real-time, integrity levels.
Citation:
D. Powell, J. Arlat, L. Beus-Dukic, A. Bondavalli, P. Coppola, A. Fantechi, E. Jenn, C. Rabéjac, A. Wellings, "GUARDS: A Generic Upgradable Architecture for Real-Time Dependable Systems," IEEE Transactions on Parallel and Distributed Systems, vol. 10, no. 6, pp. 580-599, June 1999, doi:10.1109/71.774908
Usage of this product signifies your acceptance of the Terms of Use.