The Community for Technology Leaders
Dependable Systems and Networks Workshops (2011)
Hong Kong, China
June 27, 2011 to June 30, 2011
ISBN: 978-1-4577-0374-4
pp: 140-145
Qi Zhou , Computing Platform, Alibaba Cloud Computing Corporation, Hangzhou, P.R. China
Dianxi Shi , National Laboratory for Parallel and Distributed Processing, National University of Defense Technology, Changsha, P.R. China 410073
Hua Cai , Computing Platform, Alibaba Cloud Computing Corporation, Hangzhou, P.R. China
Huaimin Wang , National Laboratory for Parallel and Distributed Processing, National University of Defense Technology, Changsha, P.R. China 410073
Tingtao Sun , Computing Platform, Alibaba Cloud Computing Corporation, Hangzhou, P.R. China
Xiang Rao , National Laboratory for Parallel and Distributed Processing, National University of Defense Technology, Changsha, P.R. China 410073
Zhenbang Chen , National Laboratory for Parallel and Distributed Processing, National University of Defense Technology, Changsha, P.R. China 410073
ABSTRACT
Extracting fault features with the error logs of fault injection tests has been widely studied in the area of large scale distributed systems for decades. However, the process of extracting features is severely affected by a large amount of noisy logs. While the existing work tries to solve the problem by compressing logs in temporal and spatial views or removing the semantic redundancy between logs, they fail to consider the co-existence of other noisy faults that generate error logs instead of injected faults, for example, random hardware faults, unexpected bugs of softwares, system configuration faults or the error rank of a log severity. During a fault feature extraction process, those noisy faults generate error logs that are not related to a target fault, and will strongly mislead the resulted fault features. We call an error log that is not related to a target fault a noisy error log. To filter out noisy error logs, we present a similarity-based error log filtering method SBF, which consists of three integrated steps: (1) model error logs into time series and use haar wavelet transform to get the approximate time series; (2) divide the approximate time series into sub time series by valleys; (3) identify noisy error logs by comparing the similarity between the sub time series of target error logs and the template of noisy error logs. We apply our log filtering method in an enterprise cloud system and show its effectiveness. Compared with the existing work, we successfully filter out noisy error logs and increase the precision and the recall rate of fault feature extraction.<sup>1</sup>
INDEX TERMS
CITATION
Qi Zhou, Dianxi Shi, Hua Cai, Huaimin Wang, Tingtao Sun, Xiang Rao, Zhenbang Chen, "Identifying faults in large-scale distributed systems by filtering noisy error logs", Dependable Systems and Networks Workshops, vol. 00, no. , pp. 140-145, 2011, doi:10.1109/DSNW.2011.5958800
98 ms
(Ver 3.3 (11022016))