Dec. 6, 2009 to Dec. 9, 2009
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICDM.2009.60
Detection of execution anomalies is very important for the maintenance, development, and performance refinement of large scale distributed systems. Execution anomalies include both work flow errors and low performance problems. People often use system logs produced by distributed systems for troubleshooting and problem diagnosis. However, manually inspecting system logs to detect anomalies is unfeasible due to the increasing scale and complexity of distributed systems. Therefore, there is a great demand for automatic anomalies detection techniques based on log analysis. In this paper, we propose an unstructured log analysis technique for anomalies detection. In the technique, we propose a novel algorithm to convert free form text messages in log files to log keys without heavily relying on application specific knowledge. The log keys correspond to the log-print statements in the source code which can provide cues of system execution behavior. After converting log messages to log keys, we learn a Finite State Automaton (FSA) from training log sequences to present the normal work flow for each system component. At the same time, a performance measurement model is learned to characterize the normal execution performance based on the log mes-sages’ timing information. With these learned models, we can automatically detect anomalies in newly input log files. Experiments on Hadoop and SILK show that the technique can effectively detect running anomalies.
log analysis, distributed system, problem diagnosis, finite state automaton
Qiang Fu, Jian-Guang Lou, Yi Wang, Jiang Li, "Execution Anomaly Detection in Distributed Systems through Unstructured Log Analysis", ICDM, 2009, 2013 IEEE 13th International Conference on Data Mining, 2013 IEEE 13th International Conference on Data Mining 2009, pp. 149-158, doi:10.1109/ICDM.2009.60