Issue No. 04 - October-December (2006 vol. 3)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TDSC.2006.52
Guofei Jiang , IEEE
With the prevalence of Internet services and the increase of their complexity, there is a growing need to improve their operational reliability and availability. While a large amount of monitoring data can be collected from systems for fault analysis, it is hard to correlate this data effectively across distributed systems and observation time. In this paper, we analyze the mass characteristics of user requests and propose a novel approach to model and track transaction flow dynamics for fault detection in complex information systems. We measure the flow intensity at multiple checkpoints inside the system and apply system identification methods to model transaction flow dynamics between these measurements. With the learned analytical models, a model-based fault detection and isolation method is applied to track the flow dynamics in real time for fault detection. We also propose an algorithm to automatically search and validate the dynamic relationship between randomly selected monitoring points. Our algorithm enables systems to have self-cognition capability for system management. Our approach is tested in a real system with a list of injected faults. Experimental results demonstrate the effectiveness of our approach and algorithms.
Fault detection, information systems, system management, regression model, model-based FDI, dynamic relationship, model validation, flow intensity and dynamics.
K. Yoshihira, H. Chen and G. Jiang, "Modeling and Tracking of Transaction Flow Dynamics for Fault Detection in Complex Systems," in IEEE Transactions on Dependable and Secure Computing, vol. 3, no. , pp. 312-326, 2006.