This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Reaching Fault Diagnosis Agreement under a Hybrid Fault Model
September 2000 (vol. 49 no. 9)
pp. 980-986

Abstract—The goal of the fault diagnosis agreement (FDA) problem is to make each fault-free processor detect/locate a common set of faulty processors. The problem is examined on processors with mixed fault model (also referred to as hybrid fault model). An evidence-based fault diagnosis protocol is proposed to solve the FDA problem. The proposed protocol first collects the messages which have accumulated in the Byzantine agreement protocol as the evidence. By examining the collected evidence, a fault-free processor can detect/locate which processor is faulty. Then, the network can be reconfigured by removing the detected faulty processors and the links connected to these processors from the network. The proposed protocol can detect/locate the maximum number of faulty processors to solve the FDA problem.

[1] J.C. Adams and K.V.S. Ramarao, “Distributed Diagnosis of Byzantine Processors and Links,” Proc. Symp. Distributed Computing Systems, pp. 562-569, 1989.
[2] M. Barborak, M. Malek, and A. Dahbura, “The Consensus Problem in Fault-Tolerant Computing,” ACM Computing Surveys, vol. 25, no. 2, pp. 171-220, June 1993.
[3] A. Bar-Noy, D. Dolev, C. Dwork, and R. Strong, “Shifting Gears: Changing Algorithms on the Fly to Expedite Byzantine Agreement,” Information and Computation, vol. 97, pp. 205-233, 1992.
[4] R.W. Buskens and R.P. Bianchini, “Distributed On-Line Diagnosis in the Presence of Arbitrary Faults,” Proc. Symp. Fault-Tolerant Computing, pp. 470-479, 1993.
[5] T. Chandra and S. Toueg, “Unreliable Failure Detectors for Asynchronous Systems,” Proc. 10th ACM Symp. Principles of Distributed Computing, pp. 325-340, 1991.
[6] F. Cristian, “Understanding Fault-Tolerant Distributed Systems,” Comm. ACM, vol. 34, no. 2, pp. 57-78, Feb. 1991.
[7] N. Deo, GRAPH THEORY with Applications to Engineering and Computer Science. Englewood Cliffs, N.J.: Prentice Hall, 1974.
[8] D. Dolev, “The Byzantine Generals Strike Again,” J. Algorithms, vol. 3, no. 1, pp. 14-30, 1982.
[9] M. Fischer and N. Lynch, “A Lower Bound for the Assure Interactive Consistency,” Information Processing Letters, vol. 14, no. 4, pp. 183-186, June 1982.
[10] M. Fischer, M. Paterson, and N. Lynch, “Impossibility of Distributed Consensus with One Faulty Process,” J. ACM, vol. 32, no. 4, pp. 374-382, Apr. 1985.
[11] H. Garcia-Molina, “Election in a Distributed Computing System,” IEEE Trans. Computers, vol. 31, no. 1, pp. 48-59, Jan. 1982.
[12] L. Lamport, R. Shostak, and M. Pease, “The Byzantine Generals Problem,” ACM Trans. Programming Languages and Systems, vol. 4, no. 3, pp. 382-401, July 1982.
[13] P. Lincoln and J. Rushby, “A Formally Verified Algorithm for Interactive Consistency under a Hybrid Fault Model,” Proc. Symp. Fault-Tolerant Computing, pp. 402-411, 1993.
[14] J. Martin, Telecommunications and the Computer, third ed. Englewood Cliffs, N.J.: Prentice Hall, 1990.
[15] S. Mallela and G.M. Masson, “Diagnosable Systems for Intermittent Faults,” IEEE Trans. Computers, vol. 27, no. 6, pp. 560-566, June 1978.
[16] S. Mallela and G.M. Masson, “Diagnosis without Repair for Hybrid Fault Situations,” IEEE Trans. Computers, vol 29, no. 6, pp. 461-470, June 1980.
[17] B.M. McMillin et al., “Byzantine Fault-Tolerance through Application Oriented Specification,” Proc. Computer Software and Application Conf., pp. 347-353, 1987.
[18] F.J. Meyer and D.K. Pradhan, “Consensus with Dual Failure Modes,” IEEE Trans. Parallel and Distributed Systems, vol. 2, no. 2, pp. 214-222, 1991.
[19] H.G. Molina, F. Pittelli, and S. Davidson, “Applications of Byzantine Agreement in Database Systems,” ACM Trans. on Data Systems, vol. 11, no. 1, pp. 27-47, Mar. 1986.
[20] M. Pease, R. Shostak, and L. Lamport, “Reaching Agreement in Presence of Faults,” J. ACM, vol. 27, no. 2, pp. 228-234, Apr. 1980.
[21] A. Pelc, “Reliable Communication in Networks with Byzantine Link Failures,” NETWORKS, vol. 22, no. 5, pp. 441-459, Aug. 1992.
[22] F. Preparata, G. Metze, and R. Chien, “On the Connection Assignment Problem of Diagnosable Systems,” IEEE Trans. Computers, vol. 16, no. 6, pp. 848-854, 1967.
[23] K.V.S. Ramarao and J.C. Adams, “On the Diagnosis of Byzantine Faults,” Proc. Symp. Reliable Distributed Systems, pp. 144-153, 1988.
[24] V. Ramaswami and J.L. Wang, “Analysis of the Link Error Monitoring Protocols in the Common Channel Signaling Network,” IEEE/ACM Trans. Networking, vol. 1, no. 1, pp. 31-47, Feb. 1993.
[25] K. Shin and P. Ramanathan, “Diagnosis of Processors with Byzantine Faults in a Distributed Computing Systems,” Proc. Symp. Fault-Tolerant Computing, pp. 55-60, 1987.
[26] G. Singh, “Leader Election in the Presence of Link Failures,” IEEE Trans. Parallel and Distributed Systems, vol. 7, no. 3, pp. 231-236, Mar. 1996.
[27] H.S. Siu, Y.H. Chin, and W.P. Yang, “Byzantine Agreement in the Presense of Mixed Faults on Processors and Links,” IEEE Trans. Parallel and Distributed Systems, vol. 9, no. 4, pp. 335-345, Apr. 1998.
[28] H.S. Siu, Y.H. Chin, and W.P. Yang, “A Note on Consensus on Dual Failure Modes,” IEEE Trans. Parallel and Distributed Systems, vol. 7, no. 3, pp. 225-230, Mar. 1996.
[29] M. Stahl, R. Buskens, and R. Bianchini, “On-Line Diagnosis in General Topology Networks,” Proc. 1992 IEEE Workshop Fault-Tolerant Parallel and Distributed Systems, pp. 114-121, 1992.
[30] N. Suri, M.M. Hugue, and C.J. Walter, “Synchronization Issues in Real-Time Systems,” Proc. IEEE, vol. 82, no. 1, pp. 41-53, Jan. 1994.
[31] P. Thambidurai and Y.-K. Park, “Interactive Consistency with Multiple Failure Modes,” Proc. Symp. Reliable Distributed Systems, pp. 93-100, Oct. 1988.
[32] N.H. Vaidya and D.K. Pradhan, “Safe System Level Diagnosis,” IEEE Trans. Computers, vol. 43, no. 3, pp. 367-370, Mar. 1994.
[33] S.C. Wang, Y.H. Chin, and K.Q. Yan, “Reaching a Fault Detection Agreement,” Proc. Int'l Conf. Parallel Processing, pp. 251-258, 1990.
[34] K.Q. Yan, Y.H. Chin, and S.C. Wang, “Optimal Agreement Protocol in Byzantine Faulty Processors and Faulty Links,” IEEE Trans. Knowledge and Data Eng., vol. 4, no. 3, pp. 266-280, June 1992.
[35] C.L. Yang and G.M. Masson, “Hybrid Fault Diagnosability with Unreliable Communication Link,” IEEE Trans. Computers, vol. 37, no. 2, pp. 175-181, Feb. 1988.
[36] C.L. Yang and G.M. Masson, “A Distributed Algorithm for Fault Diagnosis in Systems with Soft Failures,” IEEE Trans. Computers, vol. 37, no. 11, pp. 1,476-1,480, Nov. 1988.

Index Terms:
Byzantine agreement, fault diagnosis agreement, fault-tolerant distributed system, hybrid fault model, mixed fault model.
Citation:
Hsien-Sheng Hsiao, Yeh-Hao Chin, Wei-Pang Yang, "Reaching Fault Diagnosis Agreement under a Hybrid Fault Model," IEEE Transactions on Computers, vol. 49, no. 9, pp. 980-986, Sept. 2000, doi:10.1109/TC.2000.10005
Usage of this product signifies your acceptance of the Terms of Use.