June 13, 2005 to June 16, 2005
Guofei Jiang , NEC Laboratories America
Haifeng Chen , NEC Laboratories America
Cristian Ungureanu , NEC Laboratories America
Kenji Yoshihira , NEC Laboratories America
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICAC.2005.42
Detection and diagnosis of faults in a large-scale distributed system is a formidable task. Interest in monitoring and using traces of user requests for fault detection has been on the rise recently. In this paper we propose novel fault detection methods based on abnormal trace detection. One essential problem is how to represent the large amount of training trace data compactly as an oracle. Our key contribution is the novel use of varied-length n-grams and automata to characterize normal traces. A new trace is compared against the learned automata to determine whether it is abnormal. We develop algorithms to automatically extract n-grams and construct multi-resolution automata from training data. Further both deterministic and multihypothesis algorithms are proposed for detection. We inspect the trace constraints of real application software and verify the existence of long n-grams. Our approach is tested in a real system with injected faults and achieves good results in experiments.
Guofei Jiang, Haifeng Chen, Cristian Ungureanu, Kenji Yoshihira, "Multi-resolution Abnormal Trace Detection Using Varied-length N-grams and Automata", ICAC, 2005, Proceedings. Second International Conference on Autonomic Computing, Proceedings. Second International Conference on Autonomic Computing 2005, pp. 111-122, doi:10.1109/ICAC.2005.42