This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Distributed Fault Diagnosis in Multistage Network-Based Multiprocessors
September 1995 (vol. 44 no. 9)
pp. 1085-1095

Abstract—This paper is concerned with a distributed, system-level fault diagnosis scheme for multistage network-based multiprocessors. The target system, which we choose as a representative, employs a multistage interconnection network (MIN) with 4 × 4 switching elements. We propose a fast diagnostic method which uses a quadtree [1] and its coupler structure. These two quadtree structures partition the system into a number of Link-Independent Groups (LIGs). This partitioning provides an important diagnostic property; the communication paths in each LIG are either identical or disjoint. Several previous works in fault diagnosis investigated the multistage interconnection network only. This paper presents an entire multiprocessor diagnosis, including the detection and location of single faults caused by processor nodes, switching elements, and communication links. In addition, the diagnosis of a group of multiple faults partitioned by the tree structures is also discussed.

[1] W. Lin,T.L. Sheu,C.R. Das,T.Y. Feng,, and C.L. Wu,“A conflict-free routing scheme on multistage interconnectionnetworks,” IEEE Trans. Computers, vol. 38, no. 8, pp. 1,086-1,097, Aug. 1989.
[2] T.Y. Feng and C.L. Wu,“Fault-diagnosis for a class of multistage interconnection networks,” IEEE Trans. Computers, vol. 30, no. 10, pp. 743-758, Oct. 1981.
[3] W.K. Huang and F. Lombardi,“On the constant diagnosability of Baseline interconnection network,” IEEE Trans. Computers, vol. 39, no. 12, pp. 1,485-1,488, Dec. 1990.
[4] F. Lombardi,C. Feng, and W.-K. Huang,"Detection and Location of Multiple Faults in Baseline Interconnection Networks," IEEE Trans. Computers, vol. 41, no. 10, pp. 1,340-1,344, Oct. 1992.
[5] S. Thanawastien and V.P. Nelson,“Distributed path testing in a shuffle, exchange network based on a write/verifyapproach,” Proc. 1983 IEEE Real-Time Systems Symp., pp. 131-140.
[6] S. Thanawastien and V.P. Nelson,“Optimal fault detection test sequences for shuffle/exchangenetworks,” Proc. 1983 IEEE Fault-Tolerant Computing, pp. 442-445.
[7] S. Thanawastien and V.P. Nelson,“Diagnosis of multiple faults in shuffle, exchange networks,” Proc. 1984 IEEE Real-Time Systems Symp., pp. 184-192.
[8] E. Opper and G.J. Lipovski,“Fault diagnosis in non-rectangular interconnection networks,” Proc. 1983 IEEE Real-Time Systems Symp., pp. 141-149.
[9] M. Malek and E. Opper,“Multiple fault diagnosis of SW-Banyan networks,” Proc. 1983 IEEE Fault-Tolerant Computing, pp. 446-449.
[10] V. Cherkassky and E. Opper,“Fault diagnosis and permuting properties of CC-Banyan networks,” Proc. 1984 IEEE Real-Time Systems Symp., pp. 175-183.
[11] L. Ciminiera,“Design for diagnosability issues in rectangular Banyan networks,” Proc. 1984 IEEE Fault-Tolerant Computing, pp. 178-183.
[12] E. Dilger and E. Ammann,“System-level self-diagnosis in n-cube-connected multiprocessornetworks,” Proc. 1984 IEEE Fault-Tolerant Computing, pp. 184-189.
[13] R. Gupta and I.V. Ramakrishnan,“System-level fault diagnosis in malicious environments,” Proc. 1987 IEEE Fault-Tolerant Computing, pp. 184-189.
[14] C.L. Yang and G.M. Masson,“An efficient algorithm for multiprocessor fault diagnosis using the comparisonapproach,” Proc. 1986 IEEE Fault-Tolerant Computing, pp. 238-243.
[15] A. Sengupta and A. Dahbura, “On Self-Diagnosable Multiprocessor Systems: Diagnosis by the Comparison Approach,” IEEE Trans. Computers, vol. 41, no. 11, pp. 1386-1396, Nov. 1992.
[16] D.M. Blough and A. Pelc,“Diagnosis and repair in multiprocessor systems,” IEEE Trans. Computers, vol. 42, no. 2, pp. 205-217, Feb. 1993.
[17] W. Crowther,J. Goodhue,E. Starr,R. Thomas,W. Milliken,, and T. Blackadar,“Performance Measurements on a 128-Node Butterfly ParallelProcessor,” Proc. 1985 Int’l Conf. Parallel Processing, pp. 531-540.
[18] R.H. Thomas,“Behavior of the Butterfly Parallel Processor in the Presence of Memory HotSpots,” Proc. 1986 Int’l Conf. Parallel Processing, pp. 46-50.
[19] BBN Laboratories, “Butterfly Parallel Processor overview,” Report 6148, Version 1, Cambridge, Mass., Mar. 1986.
[20] N.F. Tzeng,P.C. Yew,, and C.Q. Zhu,“Fault-diagnosis in multiple-path interconnection network,” Proc. 1986 IEEE Fault-Tolerant Computing, pp. 98-103.
[21] W. Lin and C.L. Wu,“Reconfiguration procedures for a polymorphic and partitionablemultiprocessor,” IEEE Trans. Computers, vol. 35, no. 10, pp. 910-916, Oct. 1986.

Index Terms:
Contention-free routing, fault detection, fault diagnosis, fault location, multistage interconnection networks, quadtree structures.
Citation:
Woei Lin, Tsang-Ling Sheu, Chita R. Das, "Distributed Fault Diagnosis in Multistage Network-Based Multiprocessors," IEEE Transactions on Computers, vol. 44, no. 9, pp. 1085-1095, Sept. 1995, doi:10.1109/12.464387
Usage of this product signifies your acceptance of the Terms of Use.