Publication 1999 Issue No. 7 - July Abstract - Diagnosis of Parallel Computers with Arbitrary Connectivity
Diagnosis of Parallel Computers with Arbitrary Connectivity
July 1999 (vol. 48 no. 7)
pp. 757-761
 ASCII Text x Qian-Yu Tang, Xiaoyu Song, "Diagnosis of Parallel Computers with Arbitrary Connectivity," IEEE Transactions on Computers, vol. 48, no. 7, pp. 757-761, July, 1999.
 BibTex x @article{ 10.1109/12.780885,author = {Qian-Yu Tang and Xiaoyu Song},title = {Diagnosis of Parallel Computers with Arbitrary Connectivity},journal ={IEEE Transactions on Computers},volume = {48},number = {7},issn = {0018-9340},year = {1999},pages = {757-761},doi = {http://doi.ieeecomputersociety.org/10.1109/12.780885},publisher = {IEEE Computer Society},address = {Los Alamitos, CA, USA},}
 RefWorks Procite/RefMan/Endnote x TY - JOURJO - IEEE Transactions on ComputersTI - Diagnosis of Parallel Computers with Arbitrary ConnectivityIS - 7SN - 0018-9340SP757EP761EPD - 757-761A1 - Qian-Yu Tang, A1 - Xiaoyu Song, PY - 1999KW - System-level diagnosisKW - performance analysisKW - fault identificationKW - multiprocessor systems.VL - 48JA - IEEE Transactions on ComputersER -

Abstract—Recently, Fussell and Rangarajan [7], [12], [13] considered performing multiple tests to achieve correct diagnosis of constant degree connection structures. They showed that the probability of correctly identifying every processor approaches one as $n \longrightarrow \infty$, where $n$ is the number of processors in the system. In this paper, we give the performance analysis for the probabilistic diagnosis algorithm defined in [7], [12], [13]. In comparison with [7], [12], [13], we first derive a different and formal proof for the analytic expression of the probability of the event that a fault-free processor is correctly identified. We then show an exact analytic expression for the probability of the event that a faulty processor is correctly identified. Our result improves the previous results in [7], [12], [13]. In [7], an expression for the probability of the event that a faulty processor is wrongly identified was incorrectly used, while in [12], [13] the same expression was considered as an upper bound on the probability (see our Remark 3.1). However, no reasonable justification was given or proven. Based on our new results, we finally give the asymptotic analysis for the algorithm. We demonstrate that the probability of correctly identifying every processor approaches one as $n \longrightarrow \infty$.

[1] F. Barsi, F. Grandoni, and P. Maestrini, “A Theory of Diagnosability of Digital Systems,” IEEE Trans. Computers, vol. 25, no. 6, pp. 585-593, June 1976.
[2] D. M. Blough,Fault Detection and Diagnosis in Multiprocessor Systems,Ph.D. dissertation, The Johns Hopkins Univ., Baltimore, MD, 1988.
[3] H. Chernoff, “A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the Sum of Observations,” Annals Math. Statistics, vol. 23, pp. 493-507, 1952.
[4] K. Chwa and S.L. Hakimi, “Schemes for Fault-Tolerant Computing: a Comparison of Modularly Redundant and$t$-Diagnosable Systems,” Information and Control, vol. 49, pp. 212-239, June 1981.
[5] A.T. Dahbura and G.M. Masson, “An$O(n^{2. 5})$Fault Identification Algorithm for Diagnosable Systems,” IEEE Trans. Computers, vol. 33, no. 1, pp. 486-492, Jan. 1984.
[6] R.M. Dudley, A Course on Empirical Process, pp. 13-16, École d'Étéde Probabilités de Saint-flour XII, 1982.
[7] D. Fussell and S. Rangarajan, "Probabilistic Diagnosis of Multiprocessor Systems with Arbitrary Connectivity," Proc. 19th Int'l Symp. Fault-Tolerant Computing, pp. 560-565, 1989.
[8] S.L. Hakimi and A.T. Amin, “Characterization of Connection Assignment of Diagnosable Systems,” IEEE Trans. Computers, vol. 23, no. 1, pp. 86-88, Jan. 1974.
[9] J. Maeng and M. Malek, “A Comparison Connection Assignment for Self-Diagnosis of Multicomputer Systems,” Digest FTCS-11, pp. 173-175, June 1981.
[10] M. Okamoto, “Some Inequalities Relating to the Partial Sum of Binomial Probabilities,” Annals Inst. of Statistics Math., vol. 10, pp. 29-35, 1958.
[11] F.P. Preparata, G. Metze, and R.T. Chien, “On the Connection Assignment Problem of Diagnosable Systems,” IEEE Trans. Computers, vol. 16, no. 12, pp. 848-854, Dec. 1967.
[12] S. Rangarajan and D. Fussell, "Probabilistic Diagnosis Algorithms Tailored to System Topology," Proc. IEEE CS 21st Int'l Symp. Fault-Tolerant Computing, pp. 230-237, 1991.
[13] S. Rangarajan and D. Fussell, “Diagnosing Arbitrarily Connected Parallel Computers with High Probability,” IEEE Trans. Computers, vol. 41, pp. 606-615, 1992.
[14] E. Scheinerman, "Almost Sure Fault-Tolerance in Random Graphs," SIAM J. Computing, vol. 16, pp. 1,124-1,134, 1987.
[15] A.K. Somani, “Sequential Fault Occurence and Reconfiguration in System Level Diagnosis,” IEEE Trans. Computers, vol. 39, no. 12, pp. 1,472-1,475, Dec. 1990.
[16] A.K. Somani, V.K. Agarwal, and D. Avis, "A Generalized Theory for System Level Diagnosis," IEEE Trans. Computers, vol. 36, no. 5, pp. 538-546, May 1987.

Index Terms:
System-level diagnosis, performance analysis, fault identification, multiprocessor systems.
Citation:
Qian-Yu Tang, Xiaoyu Song, "Diagnosis of Parallel Computers with Arbitrary Connectivity," IEEE Transactions on Computers, vol. 48, no. 7, pp. 757-761, July 1999, doi:10.1109/12.780885