2013 IEEE 29th International Conference on Data Engineering (ICDE) (2012)
Arlington, Virginia USA
Apr. 1, 2012 to Apr. 5, 2012
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICDE.2012.88
Outlier mining is a major task in data analysis. Outliers are objects that highly deviate from regular objects in their local neighborhood. Density-based outlier ranking methods score each object based on its degree of deviation. In many applications, these ranking methods degenerate to random listings due to low contrast between outliers and regular objects. Outliers do not show up in the scattered full space, they are hidden in multiple high contrast subspace projections of the data. Measuring the contrast of such subspaces for outlier rankings is an open research challenge. In this work, we propose a novel subspace search method that selects high contrast subspaces for density-based outlier ranking. It is designed as pre-processing step to outlier ranking algorithms. It searches for high contrast subspaces with a significant amount of conditional dependence among the subspace dimensions. With our approach, we propose a first measure for the contrast of subspaces. Thus, we enhance the quality of traditional outlier rankings by computing outlier scores in high contrast projections only. The evaluation on real and synthetic data shows that our approach outperforms traditional dimensionality reduction techniques, naive random projections as well as state-of-the-art subspace search techniques and provides enhanced quality for outlier ranking.
Emmanuel Müller, Fabian Keller, Klemens Böhm, "HiCS: High Contrast Subspaces for Density-Based Outlier Ranking", 2013 IEEE 29th International Conference on Data Engineering (ICDE), vol. 00, no. , pp. 1037-1048, 2012, doi:10.1109/ICDE.2012.88