13th Pacific Rim International Symposium on Dependable Computing (PRDC 2007)
Fault Detection System Activated by Failure Information
Melbourne, Victoria, Australia
December 17-December 19
ISBN: 0-7695-3054-0
We propose a fault detection system activated by an application when the application recognizes the occurrence of a failure, in order to realize self managing systems that automatically nd the source of a failure. In existing detection systems, there are three issues for constructing self managing applications: i) the detection results are not sent to the applications, ii) they can not identify the source failure from all of the detected failures, and iii) conguring the detection system for networked system is hard work. For overcoming these issues, the proposed system takes three approaches: i) the system receives failure information from an application and returns a result set to the application, ii) the system identies the source failure using relationships among errors, and iii) the system obtains information of the monitored system from a database. The relationship is expressed by a tree. This tree is called error relationship tree. The database provides information which are system entities such as hardware devices, software object, and network topology. When the proposed system starts looking for the source of a failure, causal relations from an error relation tree are referred to, and the correspondence of error definitions and actual objects is derived using the database. We show the design of the detection operation activated by the failure information and the architecture of the proposed system. Keywords : fault detection, fault localization, error relationship tree, CIM object database, application-level failures
Citation:
Masato Sakai, Hiroya Matsuba, Yutaka Ishikawa, "Fault Detection System Activated by Failure Information," prdc, pp.19-26, 13th Pacific Rim International Symposium on Dependable Computing (PRDC 2007), 2007