2007 Seventh IEEE International Conference on Data Mining
Clustering Needles in a Haystack: An Information Theoretic Analysis of Minority and Outlier Detection
Omaha, Nebraska, USA
October 28-October 31
ISBN: 0-7695-3018-4
DOI Bookmark:
http://doi.ieeecomputersociety.org/10.1109/ICDM.2007.53
Identifying atypical objects is one of the traditional topics in machine learning. Recently, novel approaches, e.g., Minority Detection and One-class clustering, have explored further to identify clusters of atypical objects which strongly contrast from the rest of the data in terms of their distribution or density. This paper analyzes such tasks from an information theoretic perspective. Based on Information Bottleneck formalization, these tasks interpret to increasing the averaged atypicalness of the clusters while reducing the complexity of the clustering. This formalization yields a unifying view of the new approaches as well as the classic outlier detection. We also present a scalable minimization algorithm which exploits the localized form of the cost function over individual clusters. The proposed algorithm is evaluated using simulated datasets and a text classification benchmark, in comparison with an existing method.
Citation:
Shin Ando, "Clustering Needles in a Haystack: An Information Theoretic Analysis of Minority and Outlier Detection," icdm, pp.13-22, 2007 Seventh IEEE International Conference on Data Mining, 2007
Usage of this product signifies your acceptance of the
Terms of Use.
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||