Issue No. 05 - May (2015 vol. 27)
ISSN: 1041-4347
pp: 1369-1382
Mirjana Ivanovic , Faculty of Sciences, University of Novi Sad, Serbia
ABSTRACT
Outlier detection in high-dimensional data presents various challenges resulting from the “curse of dimensionality.” A prevailing view is that distance concentration, i.e., the tendency of distances in high-dimensional data to become indiscernible, hinders the detection of outliers by making distance-based methods label all points as almost equally good outliers. In this paper, we provide evidence supporting the opinion that such a view is too simple, by demonstrating that distance-based methods can produce more contrasting outlier scores in high-dimensional settings. Furthermore, we show that high dimensionality can have a different impact, by reexamining the notion of reverse nearest neighbors in the unsupervised outlier-detection context. Namely, it was recently observed that the distribution of points’ reverse-neighbor counts becomes skewed in high dimensions, resulting in the phenomenon known as hubness. We provide insight into how some points (antihubs) appear very infrequently in $k$ -NN lists of other points, and explain the connection between antihubs, outliers, and existing unsupervised outlier-detection methods. By evaluating the classic $k$ -NN method, the angle-based technique designed for high-dimensional data, the density-based local outlier factor and influenced outlierness methods, and antihub-based methods on various synthetic and real-world data sets, we offer novel insight into the usefulness of reverse neighbor counts in unsupervised outlier detection.
INDEX TERMS
Standards, Correlation, Euclidean distance, Context, Educational institutions, Noise measurement, Histograms
CITATION

M. Radovanovic, A. Nanopoulos and M. Ivanovic, "Reverse Nearest Neighbors in Unsupervised Distance-Based Outlier Detection," in IEEE Transactions on Knowledge & Data Engineering, vol. 27, no. 5, pp. 1369-1382, 2015.
doi:10.1109/TKDE.2014.2365790