Subscribe

Issue No.01 - January (2008 vol.20)

pp: 13-25

ABSTRACT

It is a major challenge to process the high dimensional measurements for failure detection and localization in large scale computing systems. However, it is observed that in information systems those measurements are usually located in a low dimensional structure that is embedded in the high dimensional space. From this perspective, a novel approach is proposed in this paper to model the geometry of underlying data generation and detect anomalies based on that model. We consider both linear and nonlinear data generation models. Two statistics, the Hotelling $T^2$ and the squared prediction error ($SPE$), are used to reflect data variations within and outside the model. We track the probabilistic density of extracted statistics to monitor the system's health. After a failure has been detected, a localization process is also proposed to find the most suspicious attributes related to the failure. Experimental results on both synthetic data and a real e-commerce application demonstrate the effectiveness of our approach in detecting and localizing failures in computing systems.

INDEX TERMS

failure detection, manifold learning, statistics, data mining, information system, Internet applications

CITATION

Haifeng Chen, Guofei Jiang, Kenji Yoshihira, "Monitoring High-Dimensional Data for Failure Detection and Localization in Large-Scale Computing Systems",

*IEEE Transactions on Knowledge & Data Engineering*, vol.20, no. 1, pp. 13-25, January 2008, doi:10.1109/TKDE.2007.190674REFERENCES

- [3] T.W. Anderson,
An Introduction to Multivariate Statistical Analysis, second ed. Wiley, 1984.- [4] M. Balasubramanian and E.L. Schwartz, “The Isomap Algorithm and Topological Stability,”
Science, vol. 295, no. 7, 2005.- [5] P. Barham, R. Isaacs, R. Mortier, and D. Narayanan, “Magpie: Real-Time Modeling and Performance-Aware Systems,”
Proc. Ninth Workshop Hot Topics in Operating Systems (HotOS '03), May 2003.- [7] M. Brand, “Charting a Manifold,”
Advances in Neural Information Processing Systems 15, MIT Press, 2003.- [9] M. Chen, E. Kiciman, E. Fratkin, A. Fox, and E. Brewer, “Pinpoint: Problem Determination in Large Dynamic Systems,”
Proc. Int'l Performance and Dependability Symp. (IPDS '02), June 2002.- [10] HP OpenView, HP Corp., http:/www.openview.hp.com/, 2007.
- [11] M.J. Desforges, P.J. Jacob, and J.E. Cooper, “Applications of Probability Density Estimation to the Detection of Abnormal Conditions in Engineering,”
Proc. Inst. of Mechanical Eng.—Part C: J. Mechanical Eng. Science, vol. 212, pp. 687-703, 1998.- [14] W. Fuller,
Measurement Error Models. John Wiley & Sons, 1987.- [15] A. Gionis, P. Indyk, and R. Motwani, “Similarity Search in High Dimensions via Hashing,”
Proc. 25th Int'l Conf. Very Large Data Bases (VLDB '99), pp. 518-529, 1999.- [16] G.H. Golub and C.F. Van Loan,
Matrix Computations, third ed. Johns Hopkins Univ. Press, 1996.- [19] Tivoli Business System Manager, IBM, http:/www.tivoli.com/, 2007.
- [20] T. Idé and H. Kashima, “Eigenspace-Based Anomaly Detection in Computer Systems,”
Proc. ACM SIGKDD '04, pp. 440-449, Aug. 2004.- [23] I.T. Jolliffe,
Principal Component Analysis. Springer Verlag, 1986.- [24] T. Kourti and J.F. MacGregor, “Recent Developments in Multivariate SPC Methods for Monitoring and Diagnosing Process and Product Performance,”
J. Quality Technology, vol. 28, no. 4, pp. 409-428, 1996.- [31] R.A. Redner and H.F. Walker, “Mixture Densities, Maximum Likelihood and the EM Algorithm,”
SIAM Rev., vol. 26, pp. 195-239, 1984.- [33] N.K. Shah and P.J. Gempcrlinc, “Combination of the Mahalanobis Distance and Residual Variance Pattern Recognition Techniques for Classification of Near-Infrared Reflectance Spectra,”
J. Am. Chemical Soc., vol. 62, no. 5, pp. 465-470, 1990.- [36] S. Van Huffel and J. Vandewalle,
The Total Least Squares Problem. Computational Aspects and Analysis. Soc. for Industrial and Applied Math., 1991. |