This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Monitoring High-Dimensional Data for Failure Detection and Localization in Large-Scale Computing Systems
January 2008 (vol. 20 no. 1)
pp. 13-25
It is a major challenge to process the high dimensional measurements for failure detection and localization in large scale computing systems. However, it is observed that in information systems those measurements are usually located in a low dimensional structure that is embedded in the high dimensional space. From this perspective, a novel approach is proposed in this paper to model the geometry of underlying data generation and detect anomalies based on that model. We consider both linear and nonlinear data generation models. Two statistics, the Hotelling $T^2$ and the squared prediction error ($SPE$), are used to reflect data variations within and outside the model. We track the probabilistic density of extracted statistics to monitor the system's health. After a failure has been detected, a localization process is also proposed to find the most suspicious attributes related to the failure. Experimental results on both synthetic data and a real e-commerce application demonstrate the effectiveness of our approach in detecting and localizing failures in computing systems.

[1] C.C. Aggarwal and P.S. Yu, “Outlier Detection for High-Dimensional Data,” Proc. ACM SIGMOD '01, pp. 37-46, 2001.
[2] M.K. Aguilera, W. Chen, and S. Toueg, “Using the Heartbeat Failure Detector for Quiescent Reliable Communication and Consensus in Partitionable Networks,” Theoretical Computer Science, special issue on distributed algorithms, vol. 220, pp. 3-30, 1999.
[3] T.W. Anderson, An Introduction to Multivariate Statistical Analysis, second ed. Wiley, 1984.
[4] M. Balasubramanian and E.L. Schwartz, “The Isomap Algorithm and Topological Stability,” Science, vol. 295, no. 7, 2005.
[5] P. Barham, R. Isaacs, R. Mortier, and D. Narayanan, “Magpie: Real-Time Modeling and Performance-Aware Systems,” Proc. Ninth Workshop Hot Topics in Operating Systems (HotOS '03), May 2003.
[6] P. Bodik et al., “Combining Visualization and Statistical Analysis to Improve Operator Confidence and Efficiency for Failure Detection and Localization,” Proc. Second Int'l Conf. Autonomic Computing (ICAC '05), pp. 89-100, June 2005.
[7] M. Brand, “Charting a Manifold,” Advances in Neural Information Processing Systems 15, MIT Press, 2003.
[8] T. Brotherton and T. Johnson, “Anomaly Detection for Advanced Military Aircraft Using Neural Networks,” Proc. IEEE Aerospace Conf., pp. 3113-3123, 2001.
[9] M. Chen, E. Kiciman, E. Fratkin, A. Fox, and E. Brewer, “Pinpoint: Problem Determination in Large Dynamic Systems,” Proc. Int'l Performance and Dependability Symp. (IPDS '02), June 2002.
[10] HP OpenView, HP Corp., http:/www.openview.hp.com/, 2007.
[11] M.J. Desforges, P.J. Jacob, and J.E. Cooper, “Applications of Probability Density Estimation to the Detection of Abnormal Conditions in Engineering,” Proc. Inst. of Mechanical Eng.—Part C: J. Mechanical Eng. Science, vol. 212, pp. 687-703, 1998.
[12] G. Eckart and G. Young, “The Approximation of One Matrix by Another of Low Rank,” Psychometrica, vol. 1, pp. 211-218, 1936.
[13] R.D. Fierro, G.H. Golub, P.C. Hansen, and D.P. O'Leary, “Regularization by Truncated Total Least Squares,” SIAM J. Scientific Computing, vol. 18, pp. 1223-1241, 1997.
[14] W. Fuller, Measurement Error Models. John Wiley & Sons, 1987.
[15] A. Gionis, P. Indyk, and R. Motwani, “Similarity Search in High Dimensions via Hashing,” Proc. 25th Int'l Conf. Very Large Data Bases (VLDB '99), pp. 518-529, 1999.
[16] G.H. Golub and C.F. Van Loan, Matrix Computations, third ed. Johns Hopkins Univ. Press, 1996.
[17] P. Grassberger and I. Procaccia, “Measuring the Strangeness of Strange Attractors,” Physica D, vol. 9, pp. 189-208, 1983.
[18] A. Höskuldsson, “PLS Regression Methods,” J. Chemometrics, vol. 2, no. 3, pp. 211-228, 1988.
[19] Tivoli Business System Manager, IBM, http:/www.tivoli.com/, 2007.
[20] T. Idé and H. Kashima, “Eigenspace-Based Anomaly Detection in Computer Systems,” Proc. ACM SIGKDD '04, pp. 440-449, Aug. 2004.
[21] G. Jiang, H. Chen, C. Ungureanu, and K. Yoshihira, “Multi-Resolution Abnormal Trace Detection Using Varied-Length $n{\hbox{-}}{\rm{Grams}}$ and Automata,” Proc. Second Int'l Conf. Autonomic Computing (ICAC '05), pp. 111-122, June 2005.
[22] G. Jiang, H. Chen, and K. Yoshihira, “Discovering Likely Invariants of Distributed Transaction Systems for Autonomic System Management,” Proc. Third Int'l Conf. Autonomic Computing (ICAC '06), pp. 199-208, June 2006.
[23] I.T. Jolliffe, Principal Component Analysis. Springer Verlag, 1986.
[24] T. Kourti and J.F. MacGregor, “Recent Developments in Multivariate SPC Methods for Monitoring and Diagnosing Process and Product Performance,” J. Quality Technology, vol. 28, no. 4, pp. 409-428, 1996.
[25] R. Kozma, M. Kitamura, M. Sakuma, and Y. Yokoyama, “Anomaly Detection by Neural Network Models and Statistical Time Series Analysis,” Proc. IEEE World Congress on Computational Intelligence '94, pp. 3207-3210, 1994.
[26] K. Yamanishi, J. Takeuchi, G. Williams, and P. Milne, “On-Line Unsupervised Outlier Detection Using Finite Mixtures with Discounting Learning Algorithms,” Proc. Sixth ACM SIGKDD '00, pp. 320-344, 2000.
[27] M. Markou and S. Singh, “Novelty Detection: A Review—Part 1: Statistical Approaches,” Signal Processing, vol. 83, pp. 2481-2497, 2003.
[28] M. Markou and S. Singh, “Novelty Detection: A Review—Part 2: Neural Network Based Approaches,” Signal Processing, vol. 83, pp.2499-2521, 2003.
[29] L. Mirsky, “Symmetric Gauge Functions and Unitarily Invariant Norms,” Quarterly J. Math. Oxford, vol. 11, pp. 50-59, 1960.
[30] M.J. Piovoso, K.A. Kosanovich, and J.P. Yuk, “Process Data Chemometrics,” IEEE Trans. Instrumentation and Measurement, vol. 41, no. 2, pp. 262-268, 1992.
[31] R.A. Redner and H.F. Walker, “Mixture Densities, Maximum Likelihood and the EM Algorithm,” SIAM Rev., vol. 26, pp. 195-239, 1984.
[32] S. Roweis and L. Saul, “Nonlinear Dimensionality Reduction by Locally Linear Embedding,” Science, vol. 290, pp. 2323-2326, 2000.
[33] N.K. Shah and P.J. Gempcrlinc, “Combination of the Mahalanobis Distance and Residual Variance Pattern Recognition Techniques for Classification of Near-Infrared Reflectance Spectra,” J. Am. Chemical Soc., vol. 62, no. 5, pp. 465-470, 1990.
[34] D.M.J. Tax and R.P.W. Duin, “Support Vector Domain Description,” Pattern Recognition Letters, vol. 20, pp. 1191-1199, 1999.
[35] J.B. Tenenbaum, V. de Silva, and J.C. Langford, “A Global Geometric Framework for Nonlinear Dimensionality Reduction,” Science, vol. 290, pp. 2319-2323, 2000.
[36] S. Van Huffel and J. Vandewalle, The Total Least Squares Problem. Computational Aspects and Analysis. Soc. for Industrial and Applied Math., 1991.

Index Terms:
failure detection, manifold learning, statistics, data mining, information system, Internet applications
Citation:
Haifeng Chen, Guofei Jiang, Kenji Yoshihira, "Monitoring High-Dimensional Data for Failure Detection and Localization in Large-Scale Computing Systems," IEEE Transactions on Knowledge and Data Engineering, vol. 20, no. 1, pp. 13-25, Jan. 2008, doi:10.1109/TKDE.2007.190674
Usage of this product signifies your acceptance of the Terms of Use.