loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Sixth IEEE International Conference on Data Mining (ICDM'06)
An Efficient Reference-Based Approach to Outlier Detection in Large Datasets
Hong Kong
December 18-December 22
ISBN: 0-7695-2701-9
Yaling Pei, University of Alberta, Canada
Osmar R. Zaiane, University of Alberta, Canada
Yong Gao, University of British Columbia Okanagan, Canada
A bottleneck to detecting distance and density based outliers is that a nearest-neighbor search is required for each of the data points, resulting in a quadratic number of pairwise distance evaluations. In this paper, we propose a new method that uses the relative degree of density with respect to a fixed set of reference points to approximate the degree of density defined in terms of nearest neighbors of a data point. The running time of our algorithm based on this approximation is O(R_n log n) where n is the size of dataset and R is the number of reference points. Candidate outliers are ranked based on the outlier score assigned to each data point. Theoretical analysis and empirical studies show that our method is effective, efficient, and highly scalable to very large datasets.
Citation:
Yaling Pei, Osmar R. Zaiane, Yong Gao, "An Efficient Reference-Based Approach to Outlier Detection in Large Datasets," icdm, pp.478-487, Sixth IEEE International Conference on Data Mining (ICDM'06), 2006
Usage of this product signifies your acceptance of the Terms of Use.