This Article 
 Bibliographic References 
 Add to: 
Better Adaptive Diagnosis of Hypercubes
October 2000 (vol. 49 no. 10)
pp. 1013-1020

Abstract—We consider the problem of adaptive fault diagnosis in hypercube multiprocessor systems. Processors perform tests on one another and later tests can be scheduled on the basis of previous test results. Fault-free testers correctly identify the fault status of tested processors, while faulty testers can give arbitrary test results. The goal is to identify correctly the status of all processors, assuming that the number of faults does not exceed the hypercube dimension. We propose an adaptive diagnosis algorithm whose efficiency is drastically better than that of any previously known strategies. While the worst-case number of tests for any of them exceeds $2^n \log n$ for an $n{\hbox{-}}{\rm dimensional}$ hypercube, our method uses at most $2^n + 3n/2$ tests in the worst case. We can also modify our algorithm to improve the number of testing rounds. By slightly increasing the number of tests to $2^n + (n+1)^2$ (still a much better performance than $2^n \log n$), we can carry out diagnosis in at most 11 rounds in the worst case (as opposed to over $n$ rounds in the best previously known strategy).

[1] J. Armstrong and F. Gray, “Fault Diagnosis in a Boolean n-Cube Array of Microprocessors,” IEEE Trans. Computers, vol. 30, pp. 587-590, 1981.
[2] R. Beigel, W. Hurwood, and N. Kahale, "Fault Diagnosis in a Flash," Proc. 36th Symp. Foundations of Computer Science, 1995.
[3] R. Beigel, S.R. Kosaraju, and G.F. Sullivan, "Locating Faults in a Constant Number of Testing Rounds," Proc. First Ann. ACM Symp. Parallel Algorithms and Architecture, pp. 189-198, 1989.
[4] R. Beigel, G. Margulis, and D.A. Spielman, "Fault Diagnosis in a Constant Number of Parallel Testing Rounds," Proc. Fifth Ann. ACM Symp. Parallel Algorithms and Architecture, pp. 21-29, 1993.
[5] P. Berman and A. Pelc, "Distributed Probabilistic Fault Diagnosis for Multiprocessor Systems," Proc. 20th Int'l Symp. Fault-Tolerant Computing, pp. 340-346, 1990.
[6] P.M. Blecher, “On a Logical Problem,” Discrete Math., vol. 43, pp. 107-110, 1983.
[7] D.M. Blough, G.F. Sullivan, and G.M. Masson, "Efficient Diagnosis of Multiprocessor Systems Under Probabilistic Models," IEEE Trans. Computers, vol. 41, pp. 1,126-1,136, 1992.
[8] A.T. Dahbura, “System-Level Diagnosis: A Perspective for the Third Decade,” Concurrent Computation: Algorithms, Architectures, Technologies, New York: Plenum Press, 1988.
[9] C. Feng, L.N. Bhuyan, and F. Lombardi, Adaptive System-Level Diagnosis for Hypercube Multiprocessors IEEE Trans. Computers, vol. 45, no. 10, pp. 1157-1170, Oct. 1996.
[10] S.L. Hakimi and K. Nakajima, “On Adaptive System Diagnosis,” IEEE Trans. Computers, vol. 33, pp. 234-240, 1984.
[11] R.W. Hamming, Coding and Information Theory.Englewood Cliffs, N.J.: Prentice Hall, 1980.
[12] W.D. Hillis, The Connection Machine, MIT Press, Cambridge, Mass., 1985.
[13] K. Nakajima, “A New Approach to System Diagnosis,” Proc. 19th Allerton Conf. Comm. Control and Computing, pp. 697-706, 1981.
[14] nCUBE 2 Processor Manual, nCUBE Corp., 1990.
[15] A. Pelc and E. Upfal, “Reliable Fault Diagnosis with Few Tests,” Combinatorics, Probability&Computing, vol. 7, pp. 323-333, 1998.
[16] F. Preparata, G. Metze, and R. Chien, “On the Connection Assignment Problem of Diagnosable Systems,” IEEE Trans. Electronic Computers, vol. 16, pp. 848-854, 1967.
[17] J. Rattner, “Concurrent Processing: A New Direction in Scientific Computing,” AFIPS Conf. Proc., pp. 157-166, 1985.
[18] E. Schmeichel, S. Hakimi, M. Otsuka, and G. Sullivan, "On Minimizing Testing Rounds for Fault Identification," Proc. 18th Int'l Symp. Fault-Tolerant Computing, pp. 266-271, July 1988.
[19] N.H. Vaidya and D.K. Pradhan, "Safe System Level Diagnosis," IEEE Trans. Computers, vol. 43, no. 3, pp. 367-370, Mar. 1994.

Index Terms:
Adaptive diagnosis, fault, hypercube, test.
Evangelos Kranakis, Andrzej Pelc, "Better Adaptive Diagnosis of Hypercubes," IEEE Transactions on Computers, vol. 49, no. 10, pp. 1013-1020, Oct. 2000, doi:10.1109/12.888036
Usage of this product signifies your acceptance of the Terms of Use.