Subscribe

Issue No.06 - June (2012 vol.23)

pp: 1047-1059

Mourad Elhadef , Abu Dhabi University, Abu Dhabi

Amiya Nayak , University of Ottawa, Ontario

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TPDS.2011.248

ABSTRACT

We consider the fault identification problem, also known as the system-level self-diagnosis, in multiprocessor and multicomputer systems using the comparison approach. In this diagnosis model, a set of tasks is assigned to pairs of nodes and their outcomes are compared by neighboring nodes. Given that comparisons are performed by the nodes themselves, faulty nodes can incorrectly claim that fault-free nodes are faulty or that faulty ones are fault-free. The collections of all agreements and disagreements, i.e., the comparison outcomes, among the nodes are used to identify the set of permanently faulty nodes. Since the introduction of the comparison model, significant progress has been made in both theory and practice associated with the original model and its offshoots. Nevertheless, the problem of efficiently identifying the set of faulty nodes when not all the comparison outcomes are available to the diagnosis algorithm at the beginning of the diagnosis phase, i.e., partial syndromes, remains an outstanding research issue. In this paper, we introduce a novel diagnosis approach using neural networks to solve this fault identification problem using partial syndromes. Results from a thorough simulation study demonstrate the effectiveness of the neural-network-based self-diagnosis algorithm for randomly generated diagnosable systems of different sizes and under various fault scenarios. We have then conducted extensive simulations using partial syndromes and nondiagnosable systems. Simulations showed that the neural-network-based diagnosis approach provided good results making it a viable addition or alternative to existing diagnosis algorithms.

INDEX TERMS

Fault tolerance, system-level self-diagnosis, comparison models, partial syndromes, neural networks.

CITATION

Mourad Elhadef, Amiya Nayak, "Comparison-Based System-Level Fault Diagnosis: A Neural Network Approach",

*IEEE Transactions on Parallel & Distributed Systems*, vol.23, no. 6, pp. 1047-1059, June 2012, doi:10.1109/TPDS.2011.248REFERENCES

- [1] M. Barborak, M. Malek, and A. Dahbura, "The Consensus Problem in Fault-Tolerant Computing,"
ACM Computing Surveys, vol. 25, no. 2, pp. 171-220, June 1993.- [2] S. Lee and K. Shin, "Probabilistic Diagnosis of Multiprocessor Systems,"
ACM Computing Surveys, vol. 26, no. 1, pp. 121-139, Mar. 1994.- [3] E.P. DuarteJr, R.P. Ziwich, and L.C. Albini, "A Survey of Comparison-Based System-Level Diagnosis,"
ACM Computing Surveys, vol. 43, no. 22, Apr. 2011.- [4] F. Preparata, G. Metze, and R. Chien, "On the Connection Assignment of Diagnosable Systems,"
IEEE Trans. Electronic Computer, vol. EC-16, no. 6, pp. 848-854, Dec. 1967.- [5] F. Barsi, F. Grandoni, and P. Maestrini, "A Theory of Diagnosability without Repairs,"
IEEE Trans. Computers, vol. C-25, no. 6, pp. 585-593, June 1976.- [6] S. Kreutzer and S. Hakimi, "Adaptive Fault Identification in Two New Diagnostic Models,"
Proc. 21st Allerton Conf. Comm., Control and Computing, pp. 353-362, 1983.- [7] M. Malek, "A Comparison Connection Assignment for Diagnosis of Multiprocessor Systems,"
Proc. Seventh Int'l Symp. Computer Architecture, pp. 31-35, 1980.- [8] K. Chwa and S. Hakimi, "Schemes for Fault Tolerant Computing: A Comparison of Modularly Redundant and $t$ -Diagnosable Systems,"
Information and Control, vol. 49, pp. 212-238, June 1981.- [9] J. Maeng and M. Malek, "A Comparison Connection Assignment for Self-Diagnosis of Multiprocessor Systems,"
Proc. 11th Int'l Symp. Fault-Tolerant Computing, pp. 173-175, 1981.- [10] A. Sengupta and A. Dahbura, "On Self-Diagnosable Multiprocessor Systems: Diagnosis by the Comparison Approach,"
IEEE Trans. Computers, vol. 41, no. 11, pp. 1386-1395, Nov. 1992.- [11] D. Blough and H. Brown, "The Broadcast Comparison Model for On-Line Fault Diagnosis in Multiprocessor Systems: Theory and Implementation,"
IEEE Trans. Computers, vol. 48, no. 5, pp. 470-493, May 1999.- [12] S. Chessa and P. Santi, "Comparison-Based System-Level Fault Diagnosis in Ad Hoc Networks,"
Proc. IEEE 20th Symp. Reliable Distributed Systems, pp. 257-266, 2001.- [13] L.C. Albini, E.P. DuarteJr, and R.P. Ziwich, "A Generalized Model for Distributed Comparison-Based System-Level Diagnosis,"
J. Brazilian Computer Soc., vol. 10, no. 3, pp. 44-56, 2005.- [14] D. Blough and A. Pelc, "Complexity of Fault Diagnosis in Comparison Models,"
IEEE Trans. Computers, vol. 41, no. 3, pp. 318-324, Mar. 1992.- [15] M. Elhadef, "A Perceptron Neural Network for Asymmetric Comparison-Based System-Level Fault Diagnosis,"
Proc. Fifth Int'l Conf. Availability, Reliability and Security (ARES '09), Mar. 2009.- [16] A. Hassanien, A. Abraham, J. Peters, G. Schaefer, and C. Henry, "Rough Sets and Near Sets in Medical Imaging: A Review,"
IEEE Trans. Information Technology in Biomedicine, vol. 13, no. 6, pp. 955-968, Nov. 2009.- [17] M. Meireles, P. Almeida, and M. Simoes, "A Comprehensive Review for Industrial Applicability of Artificial Neural Networks,"
IEEE Trans. Industrial Electronics, vol. 50, no. 3, pp. 585-601, June 2003.- [18] A.-P.N. Refenes, A. Burgess, and Y. Bentz, "Neural Networks in Financial Engineering: A Study in Methodology,"
IEEE Trans. Neural Networks, vol. 8, no. 6, pp. 1222-1267, Nov. 1997.- [19] S. Mitra, S. Pal, and P. Mitra, "Data Mining in Soft Computing Framework: A Survey,"
IEEE Trans. Neural Networks, vol. 13, no. 1, pp. 3-14, Jan. 2002.- [20] J. Korbicz, J.M. Koscielny, Z. Kowalczuk, and W. Cholewa,
Fault Diagnosis: Models, Artificial Intelligence, Applications. Springer, 2004.- [21] J. He, Z. Zhou, X. Yin, and S. Chen, "Using Neural Networks for Fault Diagnosis,"
Proc. IEEE-INNS-ENNS Int'l Joint Conf. Neural Networks (IJCNN '00), Oct. 2000.- [22] X. Yang and Y.Y. Tang, "Efficient Fault Identification of Diagnosable Systems under the Comparison Model,"
IEEE Trans. Computers, vol. 56, no. 12, pp. 1612-1618, Dec. 2007.- [23] K. Abrougui and M. Elhadef, "Parallel Self-Diagnosis of Large Multiprocessor Systems under the Generalized Comparison Model,"
Proc. 11th Int'l Conf. Parallel and Distributed Systems, pp. 78-84, July 2005.- [24] H. Wang, D. Blough, and L. Alkalaj, "Analysis and Experimental Evaluation of Comparison-Based System-Level Diagnosis of Multiprocessor Systems,"
Proc. 24th Int'l Symp. Fault-Tolerant Computing, pp. 55-64, 1994.- [25] M. Elhadef and A. Nayak, "Efficient Symmetric Comparison-Based Self-Diagnosis Using Backpropagation Artificial Neural Networks,"
Proc. IEEE 28th Int'l Performance Computing and Comm. Conf. (IPCCC '10), pp. 264-271, Dec. 2010.- [26] A. Avizienis, J.-C. Laprie, B. Randell, and C. Landwehr, "Basic Concepts and Taxonomy of Dependable and Secure Computing,"
IEEE Trans. Dependable Secure Computing, vol. 1, no. 1, pp. 11-33, Jan.-Mar. 2004.- [27] A. Pelc, "Undirected Graph Models for System Level Fault Diagnosis,"
IEEE Trans. Computers, vol. 40, no. 11, pp. 1271-1276, Nov. 1991.- [28] E. Ammann and M. DalCin, "Efficient Algorithms for Comparison-Based Self-Diagnosis,"
Proc. Self-Diagnosis and Fault-Tolerance, pp. 1-18, 1981.- [29] M. Elhadef and B. Ayeb, "Efficient Comparison-Based Fault Diagnosis of Multiprocessor Systems Using an Evolutionary Approach,"
Proc. 15th Int'l Parallel and Distributed Processing Symp. (IPDPS '01), Apr. 2001.- [30] M. Elhadef, S. Das, and A. Nayak, "System-Level Fault Diagnosis Using Comparison Models: An Artificial-Immune-Systems-Based Approach,"
J. Networks, vol. 1, no. 5, pp. 43-53, Nov. 2007.- [31] M. Elhadef, A. Nayak, and N. Zeng, "An Ant-Based Fault Identification Algorithm for Distributed and Parallel Systems,"
Proc. 10th World Conf. Integrated Design and Process Technology Conf. (IDPT '07), June 2007.- [32] M. Elhadef, A. Boukerche, and H. Elkadiki, "A Distributed Fault Identification Protocol for Mobile Ad-Hoc and Wireless Mesh Networks,"
J. Parallel and Distributed Computing, vol. 68, no. 3, pp. 321-335, Mar. 2008.- [33] A. Dahbura and G. Masson, "An $O(n^{2.5})$ Fault Identification Algorithm for Diagnosable Systems,"
IEEE Trans. Computer, vol. C-33, no. 6, pp. 486-492, June 1984.- [34] I. Stewart, "A General Algorithm for Detecting Faults Under the Comparison Diagnosis Model,"
Proc. 24th Int'l Parallel and Distributed Processing Symp. (IPDPS '10), pp. 19-23, Apr. 2010.- [35] D.W. Patterson,
Artificial Neural Networks: Theory and Applications. Prentice Hall, 1996.- [36] "Neural Network Warehouse: AI Depot," http:/neuralnetworks. ai-depot.com/, 2011.
- [37] W.E. Kozlowski and H. Krawczyk, "A Comparison-Based Approach to Multicomputer System Diagnosis in Hybrid Fault Situations,"
IEEE Trans. Computers, vol. 40, no. 11, pp. 1283-1287, Nov. 1991.- [38] J. Shawe-Taylor and N. Cristianini,
An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge Univ. Press, 2000. |