This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
A Design Diversity Metric and Analysis of Redundant Systems
May 2002 (vol. 51 no. 5)
pp. 498-510

Redundant systems are designed using multiple copies of the same resource (e.g., a logic network or a software module) in order to increase system dependability. Design diversity has long been used to protect redundant systems from common-mode failures. The conventional notion of diversity relies on independent generation of different implementations. This concept is qualitative and does not provide a basis for comparing the reliabilities of two diverse systems. In this paper, for the first time, we present a metric to quantify diversity among several designs and illustrate its effectiveness using several examples. Applications of this metric in analyzing reliability and availability of diverse redundant systems, and deriving simple relationships between diversity, system failure rate, and mission time are also demonstrated.

[1] M. Abramovici, M. Breuer, and A. Friedman, Digital Systems Testing and Testable Design. IEEE Press, 1990.
[2] A. Avizienis and L. Chen, “On the Implementation of N-Version Programming for Software Fault-Tolerance During Program Execution,” Proc. Int'l Computer Software and Applications Conf., pp. 149-155, 1977.
[3] A. Avizienis and J.P.J. Kelly, “Fault Tolerance by Design Diversity Concepts and Experiments,” Computer, pp. 67-80, Aug. 1984.
[4] D. Briere and P. Traverse, "Airbus A320/A330/A340 Electrical Flight Controls: A Family of Fault-Tolerant Systems," Proc. FTCS, pp. 616-623, 1993.
[5] J. Christmansson, M. Hiller, and M. Rimén, An Experimental Comparison of Fault and Error Injection Proc. Ninth Int'l Symp. Software Reliability Eng., (ISSRE '98), pp. 369-378, 1998.
[6] D.E. Eckhardt and L.D. Lee, “A Theoretical Basis for the Analysis of Multi-Version Software Subject to Coincident Failures,” IEEE Trans. Software Eng., vol. 11, pp. 1511-1517, Dec. 1985.
[7] M.R. Garey and D.S. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness.New York: W.H. Freeman, 1979.
[8] W.J. Huang, S. Mitra, and E.J. McCluskey, "Fast Run-Time Fault Location for Dependable FPGA-based Applications," Proc. 2001 IEEE Int'l Symp. Defect and Fault Tolerance in VLSI Systems, IEEE Press, 2001, pp. 206-214.
[9] W.-J. Huang and E.J. McCluskey, “Column-Based Precompiled Configuration Technique for FPGA Fault Tolerance,” Proc. IEEE Symp. Field Programmable Custom Computing Machines, 2001.
[10] J. Hudak, B.-H. Suh, D. Siewiorek, and Z. Segall, “Evaluation and Comparison of Fault-Tolerant Software Techniques,” IEEE Trans. Reliability, vol. 42, no. 2, June 1993.
[11] J.H. Lala and R.E. Harper, "Architectural Principles for Safety-Critical Real-Time Applications," Proc. IEEE, vol. 82, no. 1, pp. 25-40, Jan. 1994.
[12] B. Littlewood, “The Impact of Diversity Upon Common Mode Failures,” Reliability Eng. and System Safety, vol. 51, no. 1, pp. 101-113, 1996.
[13] J. Liu et al., “Heavy Ion Induced Single Event Effects in Semiconductor Device,” Proc. Int'l Conf. Atomic Collisions in Solids, 1997.
[14] G10-p Cell-Based ASIC Products Databook. LSI Logic, May 1996.
[15] M.R. Lyu and A. Avizienis, “Assuring Design Diversity in N-Version Software: A Design Paradigm for N-version Programming,” Proc. Int'l Conf. Dependable Computing for Critical Applications (DCCA), pp. 197-218, 1991.
[16] M. Lyu, Handbook of Software Reliability Engineering. CS Press, 1995.
[17] E.J. McCluskey and F.W. Clegg, “Fault Equivalence in Combinational Logic Networks,” IEEE Trans. Computers, vol. 20, no. 11, pp. 1286-1293, Nov. 1971.
[18] E.J. McCluskey, S. Makar, S. Mourad, and K.D. Wagner, “Probability Models for Pseudo-Random Test Sequences,” IEEE Trans. Computers, vol. 37, no. 2, pp. 160-174, Feb. 1988.
[19] E.J. McCluskey and C.W. Tseng, “Stuck-Fault Tests vs. Actual Defects,” Proc. Int'l Test Conf., pp. 336-343, 2000.
[20] S. Mitra, N. Saxena, and E.J. McCluskey, "A Design Diversity Metric and Reliability Analysis for Redundant Systems," Proc. IEEE Int'l Test Conference, 1999.
[21] S. Mitra, N.R. Saxena, and E.J. McCluskey, Common-Mode Failures in Redundant VLSI Systems: A Survey IEEE Trans. Reliability, special section on fault-tolerant VLSI systems, vol. 49, no. 3, pp. 285-295, Sept. 2000.
[22] S. Mitra and E.J. McCluskey, “Combinational Logic Synthesis for Diversity in Duplex Systems,” Proc. Int'l Test Conf., pp. 179-188, 2000.
[23] S. Mitra, N.R. Saxena, and E.J. McCluskey, “Fault Escapes in Duplex Systems,” Proc. IEEE VLSI Test Symp., pp. 453-458, 2000.
[24] S. Mitra and E.J. McCluskey, Design Diversity for Concurrent Error Detection in Sequential Logic Circuits Proc. IEEE VLSI Test Symp., pp. 178-183, 2001.
[25] S. Mitra, N.R. Saxena, and E.J. McCluskey, “Techniques for Calculating Design Diversity for Combinational Logic Circuits,” Proc. Int'l Conf. Dependable Systems and Networks, pp. 25-34, 2001.
[26] S. Mitra and E.J. McCluskey, “Design of Redundant Systems Protected Against Common-Mode Failures,” Proc. IEEE VLSI Test Symp., pp. 190-195, 2001.
[27] N.S. Oh, S. Mitra, and E.J. McCluskey, “ED4I: Error Detection by Diverse Data and Duplicated Instructions,” IEEE Trans. Computers, vol. 51, no. 2, pp. 180-199, Feb. 2002.
[28] D.K. Pradhan, Fault-Tolerant Computer System Design. Prentice-Hall, 1995.
[29] R. Reed et al., “Heavy Ion and Proton-Induced Single Event Multiple Upset,” IEEE Trans. Nuclear Science, vol. 44, no. 6, pp. 2224-2229, July 1997.
[30] R. Riter, "Modeling and Testing a Critical Fault-Tolerant Multi-Process System," Proc. FTCS, pp. 516-521, 1995.
[31] J. Sakov and E.J. McCluskey, “Functional Test Pattern Generation for Random Logic,” CRC Technical Report 87-1, Center For Reliable Computing, Stanford Univ., 1987.
[32] N. Saxena and E. McCluskey, "Dependable Adaptive Computing Systems," Proc. IEEE Systems, Man, and Cybernetics, Oct. 1998.
[33] N.R. Saxena et al., "Dependable Computing and On-line Testing in Adaptive Computing Systems," IEEE Design&Test of Computers, Jan.-Mar. 2000, vol. 17, no. 1, pp. 29-41.
[34] J.J. Shedletsky and E.J. McCluskey, “The Error Latency of a Fault in a Sequential Digital Circuit,” IEEE Trans. Computers, vol. 25, no. 6, pp. 655-659, June 1976.
[35] E.M. Sentovich et al., “SIS: A System for Sequential Circuit Synthesis,” ERL Memo. No. UCB/ERL M92/41, EECS, UC Berkeley.
[36] D.P. Siewiorek, “Reliability Modeling of Compensating Module Failures in Majority Voted Redundancy,” IEEE Trans. Computers, vol. 24., no. 5, pp. 525-533, May 1975.
[37] D. Siewiorek and R. Swarz, Reliable Computer Systems: Design and Evaluation. Digital Press, 1992.
[38] L. Spainhower and T.A. Gregg, “S/390 Parallel Enterprise Server G5 Fault Tolerance,” IBM J. Research Development, vol. 43, pp. 863-873, Sept.-Nov. 1999.
[39] C.E. Stroud, “Reliability of Majority Voting Based VLSI Fault-Tolerant Circuits,” IEEE Trans. VLSI, vol. 2, no. 4, pp. 516-521, Dec. 1984.
[40] Y. Tamir and C.H. Sequin, “Reducing Common Mode Failures in Duplicate Modules,” Proc. Int'l Conf. Computer Design (ICCD), pp. 302-307, 1984.
[41] K. To, “Fault Folding for Irredundant and Redundant Combinational Circuits,” IEEE Trans. Computers, vol. 22, no. 11, pp. 1008-1015, Nov. 1973.
[42] Y. Tohma and S. Aoyagi, “Failure-Tolerant Sequential Machines with Past Information,” IEEE Trans. Computers, vol. 20, no. 4, pp. 392-396, Apr. 1971.
[43] L.A. Tomek, J.K. Muppala, and K.S. Trivedi, “Modeling Correlation in Software Recovery Blocks,” IEEE Trans. Software Eng., vol. 19, no. 11, pp. 1071-1086, Nov. 1993.
[44] J. Von Neumann, “Probabilistic Logics and the Synthesis of Reliable Organisms from Unreliable Components,” Automata Studies, Annals of Math. Studies, no. 34, pp. 43-98, 1956.
[45] C.F. Webb and J.S. Liptay, "A High-Frequency Custom CMOS S/390 Microprocessor," IBM J. Research and Development, Vol. 41, No. 4/5, July/Sept. 1997, pp. 463-473.
[46] “Intel Confirms Latest Pentium Glitch,” EE Times, 10 Nov. 1997.
[47] “Vendor Settles Suit over Alleged Problems in Floppy Disk Drives,” PC World, 10 Nov. 1999.

Index Terms:
Error detection, design diversity, common-mode failures, fault-tolerant computing, dependability
Citation:
S. Mitra, N.R. Saxena, E.J. McCluskey, "A Design Diversity Metric and Analysis of Redundant Systems," IEEE Transactions on Computers, vol. 51, no. 5, pp. 498-510, May 2002, doi:10.1109/TC.2002.1004589
Usage of this product signifies your acceptance of the Terms of Use.