This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Diagnosis of Parallel Computers with Arbitrary Connectivity
July 1999 (vol. 48 no. 7)
pp. 757-761

Abstract—Recently, Fussell and Rangarajan [7], [12], [13] considered performing multiple tests to achieve correct diagnosis of constant degree connection structures. They showed that the probability of correctly identifying every processor approaches one as $n \longrightarrow \infty$, where $n$ is the number of processors in the system. In this paper, we give the performance analysis for the probabilistic diagnosis algorithm defined in [7], [12], [13]. In comparison with [7], [12], [13], we first derive a different and formal proof for the analytic expression of the probability of the event that a fault-free processor is correctly identified. We then show an exact analytic expression for the probability of the event that a faulty processor is correctly identified. Our result improves the previous results in [7], [12], [13]. In [7], an expression for the probability of the event that a faulty processor is wrongly identified was incorrectly used, while in [12], [13] the same expression was considered as an upper bound on the probability (see our Remark 3.1). However, no reasonable justification was given or proven. Based on our new results, we finally give the asymptotic analysis for the algorithm. We demonstrate that the probability of correctly identifying every processor approaches one as $n \longrightarrow \infty$.

[1] F. Barsi, F. Grandoni, and P. Maestrini, “A Theory of Diagnosability of Digital Systems,” IEEE Trans. Computers, vol. 25, no. 6, pp. 585-593, June 1976.
[2] D. M. Blough,Fault Detection and Diagnosis in Multiprocessor Systems,Ph.D. dissertation, The Johns Hopkins Univ., Baltimore, MD, 1988.
[3] H. Chernoff, “A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the Sum of Observations,” Annals Math. Statistics, vol. 23, pp. 493-507, 1952.
[4] K. Chwa and S.L. Hakimi, “Schemes for Fault-Tolerant Computing: a Comparison of Modularly Redundant and$t$-Diagnosable Systems,” Information and Control, vol. 49, pp. 212-239, June 1981.
[5] A.T. Dahbura and G.M. Masson, “An$O(n^{2. 5})$Fault Identification Algorithm for Diagnosable Systems,” IEEE Trans. Computers, vol. 33, no. 1, pp. 486-492, Jan. 1984.
[6] R.M. Dudley, A Course on Empirical Process, pp. 13-16, École d'Étéde Probabilités de Saint-flour XII, 1982.
[7] D. Fussell and S. Rangarajan, "Probabilistic Diagnosis of Multiprocessor Systems with Arbitrary Connectivity," Proc. 19th Int'l Symp. Fault-Tolerant Computing, pp. 560-565, 1989.
[8] S.L. Hakimi and A.T. Amin, “Characterization of Connection Assignment of Diagnosable Systems,” IEEE Trans. Computers, vol. 23, no. 1, pp. 86-88, Jan. 1974.
[9] J. Maeng and M. Malek, “A Comparison Connection Assignment for Self-Diagnosis of Multicomputer Systems,” Digest FTCS-11, pp. 173-175, June 1981.
[10] M. Okamoto, “Some Inequalities Relating to the Partial Sum of Binomial Probabilities,” Annals Inst. of Statistics Math., vol. 10, pp. 29-35, 1958.
[11] F.P. Preparata, G. Metze, and R.T. Chien, “On the Connection Assignment Problem of Diagnosable Systems,” IEEE Trans. Computers, vol. 16, no. 12, pp. 848-854, Dec. 1967.
[12] S. Rangarajan and D. Fussell, "Probabilistic Diagnosis Algorithms Tailored to System Topology," Proc. IEEE CS 21st Int'l Symp. Fault-Tolerant Computing, pp. 230-237, 1991.
[13] S. Rangarajan and D. Fussell, “Diagnosing Arbitrarily Connected Parallel Computers with High Probability,” IEEE Trans. Computers, vol. 41, pp. 606-615, 1992.
[14] E. Scheinerman, "Almost Sure Fault-Tolerance in Random Graphs," SIAM J. Computing, vol. 16, pp. 1,124-1,134, 1987.
[15] A.K. Somani, “Sequential Fault Occurence and Reconfiguration in System Level Diagnosis,” IEEE Trans. Computers, vol. 39, no. 12, pp. 1,472-1,475, Dec. 1990.
[16] A.K. Somani, V.K. Agarwal, and D. Avis, "A Generalized Theory for System Level Diagnosis," IEEE Trans. Computers, vol. 36, no. 5, pp. 538-546, May 1987.

Index Terms:
System-level diagnosis, performance analysis, fault identification, multiprocessor systems.
Citation:
Qian-Yu Tang, Xiaoyu Song, "Diagnosis of Parallel Computers with Arbitrary Connectivity," IEEE Transactions on Computers, vol. 48, no. 7, pp. 757-761, July 1999, doi:10.1109/12.780885
Usage of this product signifies your acceptance of the Terms of Use.