This Article 
 Bibliographic References 
 Add to: 
Modeling and Tracking of Transaction Flow Dynamics for Fault Detection in Complex Systems
October-December 2006 (vol. 3 no. 4)
pp. 312-326
With the prevalence of Internet services and the increase of their complexity, there is a growing need to improve their operational reliability and availability. While a large amount of monitoring data can be collected from systems for fault analysis, it is hard to correlate this data effectively across distributed systems and observation time. In this paper, we analyze the mass characteristics of user requests and propose a novel approach to model and track transaction flow dynamics for fault detection in complex information systems. We measure the flow intensity at multiple checkpoints inside the system and apply system identification methods to model transaction flow dynamics between these measurements. With the learned analytical models, a model-based fault detection and isolation method is applied to track the flow dynamics in real time for fault detection. We also propose an algorithm to automatically search and validate the dynamic relationship between randomly selected monitoring points. Our algorithm enables systems to have self-cognition capability for system management. Our approach is tested in a real system with a list of injected faults. Experimental results demonstrate the effectiveness of our approach and algorithms.

[1] D. Patterson, “A Simple Way to Estimate the Cost of Downtime,” Proc. 16th System Administration Conf. (LISA '02), pp. 185-188, 2002.
[2] D. Oppenheimer, A. Ganapathi, and D. Patterson, “Why Do Internet Services Fail, and What Can Be Done About It,” Proc. Fourth Usenix Symp. Internet Technologies and Systems (USITS '03), 2003.
[3] M. Chen, A. Accardi, E. Kiciman, J. Lloyd, D. Patterson, A. Fox, and E. Brewer, “Path-Based Failure and Evolution Management,” Proc. First USENIX Symp. Networked Systems Design and Implementation (NSDI '04), Mar. 2004.
[4], 2005.
[5] http:/, 2005.
[6] http:/, 2005.
[7] R. Isermann, “Model-Based Fault Detection and Diagnosis—Status and Applications,” Proc. 16th IFAC Symp. Automatic Control in Aerospace (ACA '04), June 2004.
[8] J. Gertler, Fault Detection and Diagnosis in Engineering Systems. Marcel Dekker, 1998.
[9] http://phx.corporate-ir.netphoenix.zhtml?c=97664&p=irol-newsArticle&ID=7989% 60&highlight= , 2005.
[10] /, 2005.
[11] http:/, 2005.
[12] L. Ljung, System Identification - Theory for the User, second ed. Prentice Hall PTR, 1998.
[13] R. Redner and H. Walker, “Mixture Densities, Maximum Likelihood and the Em Algorithm,” SIAM Rev., vol. 26, no. 2, pp. 195-239, 1984.
[14] H. Akaike, “Information Theory and an Extension of the Maximum Likelihood Principle,” Proc. Second Int'l Symp. Information Theory, 1973.
[15] J. Rissanen, “Prediction Minimum Description Length Principles,” Ann. Statistics, vol. 14, 1986.
[16] http:/, 2005.
[17] M. Spiegel, Theory and Problems of Probability and Statistics. McGraw-Hill, 1992.
[18] J. Han and M. Kamber, Data Mining: Concepts and Techniques. Morgan Kaufman, 2000.
[19] S. Zhang, I. Cohen, M. Goldszmidt, J. Symons, and A. Fox, “Ensembles of Models for Automated Diagnosis of System Performance Problems,” Proc. Int'l Conf. Dependable Systems and Networks (DSN), pp. 644-653, 2005.
[20] I. Cohen, S. Zhang, M. Goldszmidt, J. Symons, T. Kelly, and A. Fox, “Capturing, Indexing, Clustering, and Retrieving System History,” ACM SIGOPS Operating Systems Rev., vol. 39, no. 5, pp.105-118, 2005.
[21] N. Cristianini and J. Shawe-Taylor, An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge Univ. Press, 2000.
[22] R. Yager, M. Fedrizzi, and J. Kacprzyk, Advances in the Dempster-Shafer Theory of Evidence. Wiley, 1994.
[23] /, 2005.
[24] J. O'Madadhain, D. Fisher, S. White, and Y. Boey, “The Jung (Java Universal Network/Graph) Framework,” Technical Report UCI-ICS 03-17, Univ. of California at Irvine, School of Information and Computer Sciences,, 2003.
[25] J. Voas and G. Mcgraw, Software Fault Injection: Inoculating Programs against Errors. John Wiley & Sons, 1997.
[26] B. Tate, Bitter Java. Manning Publications, 2002.
[27] E. Kiciman and A. Fox, “Detecting Application-Level Failures in Component-Based Internet Services,” IEEE Trans. Neural Networks, vol. 16, no. 5, pp. 1027-1041, 2006.
[28] A. Yemini and S. Kliger, “High Speed and Robust Event Correlation,” IEEE Comm. Magazine, vol. 34, no. 5, pp. 82-90, May 1996.
[29] G. Jiang, H. Chen, C. Ungureanu, and K. Yoshihira, “Multi-Resolution Abnormal Trace Detection Using Varied-Length ngrams and Automata,” Proc. Second IEEE Int'l Conf. Autonomic Computing (ICAC '05), June 2005.
[30] M. Aguilera, J. Mogul, J. Wiener, P. Reynolds, and A. Muthitacharoen, “Performance Debugging for Distributed Systems of Black Boxes,” Proc. 19th ACM Symp. Operating Systems Principles (SOSP '03), pp. 74-89, 2003.
[31] R. Isermann and P. Balle, “Trends in the Application of Model-Based Fault Detection and Diagnosis of Industrial Process,” Control Eng. Practice, vol. 5, no. 5, 1997.
[32] R. Kalman, “A New Approach to Linear Filtering and Prediction Problems,” Trans. ASME-J. Basic Eng., vol. 82, no. series D, pp. 35-45, 1960.

Index Terms:
Fault detection, information systems, system management, regression model, model-based FDI, dynamic relationship, model validation, flow intensity and dynamics.
Guofei Jiang, Haifeng Chen, Kenji Yoshihira, "Modeling and Tracking of Transaction Flow Dynamics for Fault Detection in Complex Systems," IEEE Transactions on Dependable and Secure Computing, vol. 3, no. 4, pp. 312-326, Oct.-Dec. 2006, doi:10.1109/TDSC.2006.52
Usage of this product signifies your acceptance of the Terms of Use.