2012 IEEE 12th International Conference on Data Mining Workshops (2012)
Brussels, Belgium Belgium
Dec. 10, 2012 to Dec. 10, 2012
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICDMW.2012.97
In this paper, we present our vision how statistical dependency rule mining could be applied to a thorough analysis of log data. Dependency rules are especially attractive as a first step mining method due to their efficient algorithms and globally optimal results. The major drawback is a rather specific form of the dependencies, which requires binary data. It is not always clear how heterogeneous real world data should be binarized and how the tools should be used so that all interesting dependencies would be caught. We give an overview of typical problems when analyzing log data. The three major problems are: 1) How to balance between groups and individuals such that both general regularities and individual peculiarities can be found? 2) How to handle numerical and periodic variables? 3) How to extract features from the intrinsic dimensions of log data? For each problem, we give practical solutions in the form of preprocessing techniques and constraints which can be used with the existing tools. We also point out important research problems and algorithmic challenges, which would require further research.
Data mining, Automata, Cows, Redundancy, Feeds, Algorithm design and analysis, Feature extraction, preprocessing, Dependency rule, log data, numerical variable, discretization, hierarchical variable, intrinsic dimensionality
W. Hamalainen, "Thorough Analysis of Log Data with Dependency Rules: Practical Solutions and Theoretical Challenges," 2012 IEEE 12th International Conference on Data Mining Workshops(ICDMW), Brussels, Belgium Belgium, 2012, pp. 579-586.