The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.11 - Nov. (2012 vol.24)
pp: 1921-1936
Adetokunbo Makanju , Dalhousie University, Nova Scotia
A. Nur Zincir-Heywood , Dalhousie University, Nova Scotia
Evangelos E. Milios , Dalhousie University, Nova Scotia
ABSTRACT
Message type or message cluster extraction is an important task in the analysis of system logs in computer networks. Defining these message types automatically facilitates the automatic analysis of system logs. When the message types that exist in a log file are represented explicitly, they can form the basis for carrying out other automatic application log analysis tasks. In this paper, we introduce a novel algorithm for carrying out message type extraction from event log files. IPLoM, which stands for Iterative Partitioning Log Mining, works through a 4-step process. The first three steps hierarchically partition the event log into groups of event log messages or event clusters. In its fourth and final stage, IPLoM produces a message type description or line format for each of the message clusters. IPLoM is able to find clusters in data irrespective of the frequency of its instances in the data, it scales gracefully in the case of long message type patterns and produces message type descriptions at a level of abstraction, which is preferred by a human observer. Evaluations show that IPLoM outperforms similar algorithms statistically significantly.
INDEX TERMS
Kernel, Data mining, Humans, Clustering algorithms, Buildings, Observers, Partitioning algorithms, clustering, Algorithms, experimentation, event log mining, fault management
CITATION
Adetokunbo Makanju, A. Nur Zincir-Heywood, Evangelos E. Milios, "A Lightweight Algorithm for Message Type Extraction in System Application Logs", IEEE Transactions on Knowledge & Data Engineering, vol.24, no. 11, pp. 1921-1936, Nov. 2012, doi:10.1109/TKDE.2011.138
REFERENCES
[1] J.O. Kephart and D.M. Chess, "The Vision of Autonomic Computing," Computer, vol. 36, no. 1, pp. 41-50, Jan. 2003.
[2] I. Cohen, S. Zhang, M. Goldszmidt, J. Symons, T. Kelly, and A. Fox, "Capturing, Indexing, Clustering, and Retrieving System History," Proc. 20th ACM Symp. Operating Systems Principles, pp. 105-118, 2005.
[3] M. Jiang, M.A. Munawar, T. Reidemeister, and P.A. Ward, "Dependency-Aware Fault Diagnosis with Metric-Correlation Models in Enterprise Software Systems," Proc. Sixth Int'l Conf. Network and Service Management, pp. 137-141, 2010.
[4] M. Klemettinen, "A Knowledge Discovery Methodology for Telecommunications Network Alarm Databases," PhD dissertation, Univ. of Helsinki, 1999.
[5] S. Ma and J. Hellerstein, "Mining Partially Periodic Event Patterns with Unknown Periods," Proc. 16th Int'l Conf. Data Eng., pp. 205-214, 2000.
[6] Q. Zheng, K. Xu, W. Lv, and S. Ma, "Intelligent Search for Correlated Alarm from Database Containing Noise Data," Proc. Eighth IEEE/IFIP Network Operations and Management Symp., pp. 405-419, 2002.
[7] J. Stearley, "Towards Informatic Analysis of Syslogs," Proc. IEEE Int'l Conf. Cluster Computing, pp. 309-318, 2004.
[8] A. Makanju, A.N. Zincir-Heywood, and E.E. Milios, "Storage and Retrieval of System Log Events Using a Structured Schema Based on Message Type Transformation," Proc. 26th ACM Symp. Applied Computing (SAC), pp. 525-531, Mar. 2011.
[9] W. Xu, L. Huang, A. Fox, D. Patterson, and M.I. Jordan, "Detecting Large-Scale System Problems by Mining Console Logs," SOSP '09: Proc. ACM SIGOPS 22nd Symp. Operating Systems Principles, pp. 117-132, 2009.
[10] Q. Fu, J.-G. Lou, Y. Wang, and J. Li, "Execution Anomaly Detection in Distributed Systems through Unstructured Log Analysis," Proc. Ninth IEEE Int'l Conf. Data Mining (ICDM '09), pp. 149-158, Dec. 2009.
[11] R. Agrawal and R. Srikant, "Fast Algorithms for Mining Association Rules," Proc. 20th Int'l Conf. Very Large Data Bases (VLDB), J.B. Bocca, M. Jarke, and C. Zaniolo, eds., pp. 487-499, 1994.
[12] R. Vaarandi, "A Data Clustering Algorithm for Mining Patterns from Event Logs," Proc. IEEE Workshop IP Operations and Management, pp. 119-126, 2003.
[13] R. Vaarandi, "A Breadth-First Algorithm for Mining Frequent Patterns from Event Logs," Proc. IFIP Int'l Conf. Intelligence in Comm. Systems, vol. 3283, pp. 293-308, 2004.
[14] I. Rigoutsos and A. Floratos, "Combinatorial Pattern Discovery in Biological Sequences: The TEIRESIAS Algorithm," Bioinformatics, vol. 14, pp. 55-67, 1998.
[15] C. Lonvick, "The BSD Syslog Protocol," RFC3164, Aug. 2001.
[16] R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan, "Automatic Subspace Clustering of High Dimensional for Data Mining Applications," Proc. ACM SIGMOD Int'l Conf. Management of Data, 1998.
[17] S. Guha, R. Rastogi, and K. Shim, "CURE: An Efficient Clustering Algorithm for Large Databases," Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 73-84, 1998.
[18] S. Goil, H. Nagesh, and A. Choudhary, "MAFIA: Efficient and Scalable Subspace Clustering for Very Large Data Sets," technical report, Northwestern Univ., 1999.
[19] J.H. Bellec and M.T. Kechadi, "CUFRES: Clustering using Fuzzy Representative Event Selection for the Fault Recognition Problem in Telecommunications Networks," Proc. ACM First PhD Workshop in CIKM, pp. 55-62, 2007.
[20] T. Li, F. Liang, S. Ma, and W. Peng, "An Integrated Framework on Mining Log Files for Computing System Management," Proc. ACM SIGKDD Int'l Conf. Knowledge Discovery in Data Mining (KDD '05), pp. 776-781, 2005.
[21] B. Topol, "Automating Problem Determination: A First Step Toward Self Healing Computing Systems," IBM White Paper, http://www-106.ibm.com/developerworks/autonomic/ library/ac-summaryac-prob.html, Oct. 2003.
[22] J. Stearley, "Sisyphus Log Data Mining Toolkit," http://www. cs.sandia.govsisyphus, Jan. 2009.
[23] A. Makanju, S. Brooks, N. Zincir-Heywood, and E.E. Milios, "Logview: Visualizing Event Log Clusters," Proc. Sixth Ann. Conf. Privacy, Security and Trust (PST), pp. 99-108, Oct. 2008.
[24] F. Salfener and M. Malek, "Using Hidden Semi-Markov Models for Effective Online Failure Prediction," Proc. 26th IEEE Int'l Symp. Reliable Distributed Systems, pp. 161-174, 2007.
[25] G. Grabarnik, A. Salahshour, B. Subramanian, and S. Ma, "Generic Adapter Logging Toolkit," Proc. Int'l Conf. Autonomic Computing, pp. 308-309, 2004.
[26] W. van der Aalst and H. Verbeek, "Process Mining in Web Services: The Websphere Case," IEEE Bull. of the Technical Committee on Data Eng., vol. 31, no. 3, pp. 45-48, 2008.
[27] W.D. Pauw, M. Lei, E. Pring, L. Villard, M. Arnold, and J. Morar, "Web Services Navigator: Visualizing the Execution of Web Services," IBM Systems J., vol. 44, no. 4, pp. 821-845, 2005.
[28] Los Alamos Nat'l Security LLC, "Operational Data to Support and Enable Computer Science Research," http://www.pdl.cmu.eduFailureData/ and http://institutes.lanl.gov/datafdata/, Jan. 2009.
[29] J. Han, J. Pei, and Y. Yin, "Mining Frequent Patterns without Candidate Generation," Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 1-12, 2000.
[30] A. Makanju, A.N. Zincir-Heywood, and E.E. Milios, "Extracting Message Types from BlueGene/L's Logs," Proc. 22nd ACM Symp. Operating Systems Principles, Workshop the Analysis of System Logs (WASL '09), 2009.
[31] "Usenix - the Computer Failure Data Repository," http://cfdr.usenix.orgdata.html, June 2009.
[32] TOP500.Org, "Top500 Supercomputing Sites," http:/www. top500.org/, June 2009.
[33] A. Oliner, A. Aiken, and J. Stearley, "Alert Detection in System Logs," Proc. Int'l Conf. Data Mining (ICDM), pp. 959-964, 2008.
[34] A. Makanju, A.N. Zincir-Heywood, and E.E. Milios, "Fast Entropy Based Alert Detection in Supercomputer Logs," Proc. DSN Workshop Proactive Failure Avoidance, Recovery and Maintenance, 2010.
16 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool