This Article 
 Bibliographic References 
 Add to: 
Application-Level Diagnostic and Membership Protocols for Generic Time-Triggered Systems
March/April 2011 (vol. 8 no. 2)
pp. 177-193
Marco Serafini, TU Darmstadt, Darmstadt
Péter Bokor, TU Darmstadt, Darmstadt
Neeraj Suri, TU Darmstadt, Darmstadt
Jonny Vinter, SP Swedish National Testing and Research Institute, Boras
Astrit Ademaj, TU Wien, Vienna
Wolfgang Brandstätter, AUDI, Ingolstadt
Fulvio Tagliabò, Centro Ricerche Fiat, Orbassano
Jens Koch, Airbus Deutschland, Hamburg
We present online tunable diagnostic and membership protocols for generic time-triggered (TT) systems to detect crashes, send/receive omission faults, and network partitions. Compared to existing diagnostic and membership protocols for TT systems, our protocols do not rely on the single-fault assumption and also tolerate non-fail-silent (Byzantine) faults. They run at the application level and can be added on top of any TT system (possibly as a middleware component) without requiring modifications at the system level. The information on detected faults is accumulated using a penalty/reward algorithm to handle transient faults. After a fault is detected, the likelihood of node isolation can be adapted to different system configurations, including configurations where functions with different criticality levels are integrated. All protocols are formally verified using model checking. Using actual automotive and aerospace parameters, we also experimentally demonstrate the transient fault handling capabilities of the protocols.

[1] A. Ademaj et al., "Evaluation of Fault Handling of the Time Triggered Architecture with Bus and Star Topology," Proc. Int'l Conf. Dependable Systems and Networks (DSN), pp. 123-132, 2003.
[2] M. Barborak et al., "The Consensus Problem in Fault Tolerant Computing," ACM Surveys, vol. 25, no. 2, pp. 171-220, June 1993.
[3] G. Bauer and M. Paulitsch, "An Investigation of Membership and Clique Avoidance in TTP/C," Proc. IEEE Symp. Reliable Distributed Systems (SRDS), pp. 118-124, 2000.
[4] C. Basile et al., "Group Communication Protocols under Errors," Proc. IEEE Symp. Reliable Distributed Systems (SRDS), pp. 35-44, 2003.
[5] C. Bergenhem and J. Karlsson, "A Process Group Membership Service for Active Safety Systems Using TT/ET Communication Scheduling," Proc. Pacific Rim Int'l Symp. Dependable Computing (PRDC), pp. 282-289, 2007.
[6] A. Bondavalli et al., "Discriminating Fault Rate and Persistency to Improve Fault Treatment," Proc. Int'l Symp. Fault-Tolerant Computing (FTCS), pp. 354-362, 1997.
[7] A. Bondavalli et al., "Threshold-Based Mechanisms to Discriminate Transient from Intermittent Faults," IEEE Trans. Computers, vol. 49, no. 3, pp. 230-245, Mar. 2000.
[8] A. Bouajjani and A. Merceron, "Parametric Verification of a Group Membership Algorithm," Theory and Practice of Logic Programming, vol. 6, no. 3, pp. 321-353, May 2006.
[9] E.M. Clarke, O. Grumberg, and D.A. Peled, Model Checking. MIT Press, 2000.
[10] C. Constantinescu, "Impact of Deep Submicron Technology on Dependability of VLSI Circuits," Proc. Int'l Conf. Dependable Systems and Networks (DSN), pp. 205-209, 2000.
[11] F. Cristian, "Reaching Agreement on Processor-Group Membership in Synchronous Distributed Systems," Distributed Computing, vol. 4, no. 4, pp. 175-187, Dec. 1991.
[12] L. de Moura et al., "SAL 2," Proc. Int'l Conf. Computer Aided Verification (CAV), pp. 496-500, 2004.
[13] L. de Moura et al., "Bounded Model Checking and Induction: From Refutation to Verification," Proc. Int'l Conf. Computer Aided Verification (CAV), pp. 14-26, 2003.
[14] P.D. Ezhilchelvan and R. Lemos, "A Robust Group Membership Algorithm for Distributed Real-Time Systems," Proc. Real-Time Systems Symp. (RTSS), pp. 173-179, 1990.
[15] FlexRay Communication System, Protocol Specification v. 2.1, http://www.flexray.comspecification_request_v21.php , 2010.
[16] B. Hall, K. Driscoll, M. Paulitsch, and S. Dajani-Brown, "Ringing Out Fault Tolerance. A New Ring Network for Superior Low-Cost Dependability," Proc. Int'l Conf. Dependable Systems and Networks (DSN), pp. 298-307, 2005.
[17] M.A. Hiltunen, "Membership and System Diagnosis," Proc. IEEE Symp. Reliable Distributed Systems (SRDS), pp. 208-217, 1995.
[18] K. Hoyme and K. Driscoll, "SAFEbus," IEEE Aerospace and Electronic Systems Magazine, vol. 8, no. 3, pp. 34-39, Mar. 1993.
[19] H. Kopetz et al., "The Time-Triggered Ethernet (TTE) Design," Proc. Int'l Symp. Object-Oriented Real-Time Distributed Computing (ISORC), pp. 22-33, 2005.
[20] H. Kopetz and G. Bauer, "The Time-Triggered Architecture," Proc. IEEE, vol. 91, no. 1, pp. 112-126, 2003.
[21] H. Kopetz and G. Grunsteidl, "TTP—A Protocol for Fault Tolerant Real Time Systems," Computer, vol. 27, no. 1, pp. 14-23, Jan. 1994.
[22] L. Lamport, R. Shostak, and M. Pease, "The Byzantine Generals Problem," ACM Trans. Programming Languages and Systems, vol. 4, no. 3, pp. 382-401, July 1982.
[23] P. Lincoln and J. Rushby, "A Formally Verified Algorithm for Interactive Consistency under Hybrid Fault Models," Proc. Int'l Symp. Fault-Tolerant Computing (FTCS), pp. 402-411, 1993.
[24] N.A. Lynch, Distributed Algorithms. Morgan Kaufmann, 1996.
[25] M. Malek, "A Comparison Connection Assignment for Diagnosis of Multiprocessor Systems," Proc. Int'l Symp. Computer Architecture (ASCA), pp. 31-36, 1980.
[26] H. Pfeifer, "Formal Verification of the TTP Group Membership Algorithm," Proc. Joint Int'l Conf. Formal Description Techniques for Distributed Systems and Comm. Protocols (FORTE XIII) and Protocol Specification, Testing and Verification (PSTV XX), pp. 3-18, 2000.
[27] F.P. Preparata et al., "On the Connection Assignment Problem of Diagnosable Systems," IEEE Trans. Electronic Computers, vol. EC-16, no. 12, pp. 848-854, Dec. 1967.
[28] J. Rushby, "Systematic Formal Verification for Fault-Tolerant Time-Triggered Algorithms," IEEE Trans. Software Eng., vol. 25, no. 5, pp. 651-660, Sept. 1999.
[29] S. Katz et al., "Low-Overhead Time-Triggered Group Membership," Proc. Int'l Workshop Distributed Algorithms (WDAG), pp. 155-169, 1997.
[30] M. Serafini et al., "Online Diagnosis and Recovery: On the Choice and Impact of Tuning Parameters," IEEE Trans. Dependable and Secure Computing, vol. 4, no. 4, pp. 295-312, Oct. 2007.
[31] H. Sivecrona et al., "Protocol Membership Agreement in Distributed Communication Systems—A Question of Brittleness," Proc. SAE, 2003.
[32] W. Steiner et al., "The TTA's Approach to Resilience After Transient Upsets," Real-Time Systems, vol. 32, no. 3, pp. 213-233, 2006.
[33] P. Thambidurai and Y. Park, "Interactive Consistency with Multiple Failure Modes," Proc. IEEE Symp. Reliable Distributed Systems (SRDS), pp. 93-100, 1988.
[34] C. Walter et al., "Formally Verified On-Line Diagnosis," IEEE Trans. Software Eng., vol. 23, no. 11, pp. 684-721, Nov. 1997.

Index Terms:
Diagnosis, membership, time-triggered systems, transient faults.
Marco Serafini, Péter Bokor, Neeraj Suri, Jonny Vinter, Astrit Ademaj, Wolfgang Brandstätter, Fulvio Tagliabò, Jens Koch, "Application-Level Diagnostic and Membership Protocols for Generic Time-Triggered Systems," IEEE Transactions on Dependable and Secure Computing, vol. 8, no. 2, pp. 177-193, March-April 2011, doi:10.1109/TDSC.2010.23
Usage of this product signifies your acceptance of the Terms of Use.