This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
BORDER: Efficient Computation of Boundary Points
March 2006 (vol. 18 no. 3)
pp. 289-303
This work addresses the problem of finding boundary points in multidimensional data sets. Boundary points are data points that are located at the margin of densely distributed data such as a cluster. We describe a novel approach called BORDER (a BOundaRy points DEtectoR) to detect such points. BORDER employs the state-of-the-art database technique—the Gorder kNN join and makes use of the special property of the reverse k nearest neighbor (RkNN). Experimental studies on data sets with varying characteristics indicate that BORDER is able to detect the boundary points effectively and efficiently.

[1] C.C. Aggarwal and P.S. Yu, “Outlier Detection for High Dimensional Data,” Proc. SIGMOD, 2001.
[2] V. Barnett and T. Lewis, Outliers in Statistical Data. John Wiley and Sons, 1994.
[3] M. Basseville and I.V. Nikiforov, Detection of Abrupt Changes. PTR Prentice Hall, 1993.
[4] N. Beckmann, H.-P. Kriegel, R. Schneider, and B. Seeger, “The R*-Tree: An Efficient and Robust Access Method for Points and Rectangles,” Proc. ACM SIGMOD, pp. 322-331, 1990.
[5] C. Bohm, S. Berchtold, and D.A. Keim, “Searching in High Dimensional Spaces: Index Structures for Improving the Performance of Multimedia Databases,” ACM Computing Surveys, vol. 33, no. 3, pp. 322-373, 2001.
[6] C. Bohm and F. Krebs, “The k-Nearest Neighbor Join: Turbo Charging the KDD Process,” Knowledge and Information Systems (KAIS), vol. 6, no. 6, pp. 728-749, 2004.
[7] C. Bohm and F. Krebs, “Supporting KDD Applications by the k-Nearest Neighbor Join,” Proc. Int'l Conf. Database and Expert Systems Applications, pp. 504-516, 2003.
[8] C. Bohm and H.-P. Kriegel, “A Cost Model and Index Architecture for the Similarity Join,” Proc. Int'l Conf. Data Eng., pp. 411-420, 2001.
[9] M.M. Breunig, H.-P. Kriegel, R.T. Ng, and J. Sander, “Lof: Identifying Density-Based Local Outliers,” Proc. ACM SIGMOD, pp. 93-104, 2000.
[10] B.E. Brodsky and B.S. Darkhovsky, Nonparametric Methods in Change-Point Problems. Kluwer Academic Publishers, 1993.
[11] K. Chakrabarti and S. Mehrotra, “Local Dimensionality Reduction: A New Approach to Indexing High Dimensional Spaces,” Proc. Int'l Conf. Very Large Data Bases, pp. 89-100, 2000.
[12] M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, “A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise,” Proc. SIGKDD, pp. 226-231, 1996.
[13] U.M. Fayyad, G. Piatetsky-Shapiro, and P. Smyth, Advances in Knowledge Discovery and Data Mining. AAAI Press, 1996.
[14] A. Guttman, “R-Trees: A Dynamic Index Structure for Spatial Searching,” Proc. ACM SIGMOD, pp. 47-57, 1984.
[15] J. Han and M. Kamber, Data Mining Concepts and Techniques. Morgan Kaufmann Publishers, 2000.
[16] W. Hsu, M.-L. Lee, B.C. Ooi, P.K. Mohanty, K.L. Teo, and C. Xia, “Advanced Database Technologies in a Diabetic Healthcare System,” Proc. Int'l Conf. Very Large Data Bases, pp. 1059-1062, 2002.
[17] Data Mining Software in Java, http://www.cs.waikato.ac.nz/mlweka, 2005.
[18] A.K. Jain, M.N. Murty, and P.J. Flynn, “Data Clustering: A Review,” ACM Computing Surveys, vol. 31, no. 3, pp. 264-323, 1999.
[19] H. Jin, B.C. Ooi, H.T. Shen, C. Yu, and A.Y. Zhou, “An Adaptive and Efficient Dimensionality Reduction Algorithm for High-Dimensional Indexing,” Proc. Int'l Conf. Data Eng., pp. 87-98, 2003.
[20] G. Karypis, E.-H. Han, and V. Kumar, “Chameleon: Hierarchical Clustering Using Dynamic Modeling,” Computer, vol. 32, no. 8, pp. 68-75, 1999.
[21] G. Kollios, D. Gunopulos, and V.J. Tsotras, “Nearest Neighbor Queries in a Mobile Environment,” Spatio-Temporal Database Management, pp. 119-134, 1999.
[22] F. Korn and S. Muthukrishnan, “Influence Sets Based on Reverse Nearest Neighbor Queries,” Proc. ACM SIGMOD, pp. 201-212, 2000.
[23] F. Korn, S. Muthukrishnan, and D. Srivastava, “Reverse Nearest Neighbor Aggregates over Data Streams,” Proc. Int'l Conf. Very Large Data Bases, 2002.
[24] L. Horvath and M. Csorgo, Limit Theorems in Change-Point Analysis. Wiley, 1997.
[25] A. Singh, H. Ferhatosmanoglu, and A.S. Tosun, “High Dimensional Reverse Nearest Neighbor Queries,” Proc. Conf. Information and Knowledge Management, pp. 91-98, 2003.
[26] M. Smid, “Closest Point Problems in Computational Geometry,” Handbook on Computational Geometry, Elsevier Science Publishing, 1997.
[27] I. Stanoi, D. Agrawal, and A.E. Abbadi, “Reverse Nearest Neighbor Queries for Dynamic Databases,” Proc. ACM SIGMOD Workshop Research Issues in Data Mining and Knowledge Discovery, pp. 44-53, 2000.
[28] I. Stanoi, M. Riedewald, D. Agrawal, and A. El Abbadi, “Discovery of Influence Sets in Frequently Updated Databases,” Proc. Int'l Conf. Very Large Data Bases, pp. 99-108, 2001.
[29] Y. Tao, D. Papadias, and X. Lian, “Reverse KNN Search in Arbitrary Dimensionality,” Proc. Int'l Conf. Very Large Data Bases, pp. 744-755, 2004.
[30] C. Xia, H. Lu, B.C. Ooi, and J. Hu, “Gorder: An Efficient Method for KNN Join Processing,” Proc. Int'l Conf. Very Large Data Bases, 2004.
[31] C. Yang and K.-I. Lin, “An Index Structure for Efficient Reverse Nearest Neighbor Queries,” Proc. Int'l Conf. Data Eng., pp. 485-492, 2001.

Index Terms:
Boundary points, kNN join, k-nearest neighbor, reverse k-nearest neighbor.
Citation:
Chenyi Xia, Wynne Hsu, Mong Li Lee, Beng Chin Ooi, "BORDER: Efficient Computation of Boundary Points," IEEE Transactions on Knowledge and Data Engineering, vol. 18, no. 3, pp. 289-303, March 2006, doi:10.1109/TKDE.2006.38
Usage of this product signifies your acceptance of the Terms of Use.