This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
An Efficient Cost Model for Optimization of Nearest Neighbor Search in Low and Medium Dimensional Spaces
October 2004 (vol. 16 no. 10)
pp. 1169-1184
Existing models for nearest neighbor search in multidimensional spaces are not appropriate for query optimization because they either lead to erroneous estimation or involve complex equations that are expensive to evaluate in real-time. This paper proposes an alternative method that captures the performance of nearest neighbor queries using approximation. For uniform data, our model involves closed formulae that are very efficient to compute and accurate for up to 10 dimensions. Further, the proposed equations can be applied on nonuniform data with the aid of histograms. We demonstrate the effectiveness of the model by using it to solve several optimization problems related to nearest neighbor search.

[1] S. Arya, D. Mount, and O. Narayan, Accounting for Boundary Effects in Nearest Neighbor Searching Proc. Ann. Symp. Computational Geometry, 1995.
[2] S. Acharya, V. Poosala, and S. Ramaswamy, Selectivity Estimation in Spatial Databases Proc. ACM SIGMOD Conf., 1999.
[3] C. Boehm, A Cost Model for Query Processing in High Dimensional Data Spaces ACM Trans. Database Systems, vol. 25, no. 2, pp. 129-178, 2000.
[4] N. Beckmann, H. Kriegel, R. Schneider, and B. Seeger, The R*-Tree: An Efficient and Robust Access Method for Points and Rectangles Proc. ACM SIGMOD Conf., 1990.
[5] S. Berchtold, D. Keim, and H. Kriegel, The X-Tree: An Index Structure for High-Dimensional Data Proc. Very Large Database Conf., 1996.
[6] S. Berchtold, C. Boehm, D. Keim, and H. Kriegel, A Cost Model for Nearest Neighbor Search in High-Dimensional Data Space Proc. ACM Symp. Principles of Database Systems, 1997.
[7] S. Berchtold, C. Boehm, D. Keim, F. Krebs, and H. Kriegel, On Optimizing Nearest Neighbor Queries in High-Dimensional Data Spaces Proc. Int'l Conf. Database Theory, 2001.
[8] K. Beyer, J. Goldstein, R. Ramakrishnan, and U. Shaft, When Is `Nearest Neighbor' Meaningful? Proc. Int'l Conf. Database Theory, 1999.
[9] S. Berchtold and H. Kriegel, Dynamically Optimizing High-Dimensional Index Structures Proc. Int'l Conf. Extending Database Technology, 2000.
[10] J. Cleary, Analysis of an Algorithm for Finding Nearest Neighbors in Euclidean Space ACM Trans. Math. Software, vol. 5, no. 2, pp. 183-192, 1979.
[11] A. Corral, Y. Manolopoulos, Y. Theodoridis, and M. Vassilakopoulos, Closest Pair Queries in Spatial Databases Proc. ACM SIGMOD Conf., 2000.
[12] P. Ciaccia, M. Patella, and P. Zezula, A Cost Model for Similarity Queries in Metric Spaces Proc. ACM Conf. Principles on Database Systems, 1998.
[13] J. Friedman, J. Bentley, and R. Finkel, An Algorithm for Finding Best Matches in Logarithmic Expected Time ACM Trans. Math. Software, vol. 3, no. 3, pp. 209-226, 1977.
[14] C. Faloutsos and I. Kamel, Beyond Uniformity and Independence, Analysis of R-Trees Using the Concept of Fractal Dimension Proc. ACM Conf. Principles of Database Systems, 1994.
[15] C. Faloutsos, M. Ranganathan, and Y. Manolopoulos, Fast Subsequence Matching in Time-Series Databases Proc. ACM SIGMOD Conf., 1994.
[16] H. Ferhatosmanoglu, I. Stanoi, D. Agarwal, and A. Abbadi, Constrained Nearest Neighbor Queries Proc. Symp. Spatial and Temporal Databases, 2001.
[17] D. Gunopulos, G. Kollios, V. Tsotras, and C. Domeniconi, Approximate Multi-Dimensional Aggregate Range Queries over Real Attributes Proc. ACM SIGMOD Conf., 2000.
[18] G. Hjaltason and H. Samet, Distance Browsing in Spatial Databases Proc. ACM Trans. Database Systems, vol. 24, no. 2, pp. 265-318, 1999.
[19] F. Korn, B. Pagel, and C. Faloutsos, On the 'Dimensionality Curse' and the 'Self-Similarity Blessing' IEEE Trans. Knowledge and Database Eng., vol. 13, no. 1, pp. 96-111, 2001.
[20] J. Lee, D. Kim, and C. Chung, Multidimensional Selectivity Estimation Using Compressed Histogram Information Proc. ACM SIGMOD Conf., 1999.
[21] Y. Matias, J. Vitter, and M. Wang, Wavelet-Based Histograms for Selectivity Estimation Proc. ACM SIGMOD Conf., 1998.
[22] B. Pagel, F. Korn, and C. Faloutsos, Deflating the Dimensionality Curse Using Multiple Fractal Dimensions Proc. IEEE Int'l Conf. Database Eng., 2000.
[23] A. Papadopoulos and Y. Manolopoulos, Performance of Nearest Neighbor Queries in R-Trees Proc. Int'l Conf. Database Theory, 1997.
[24] N. Roussopoulos, S. Kelly, and F. Vincent, Nearest Neighbor Queries Proc. ACM SIGMOD Conf., 1995.
[25] R. Sproull, Refinements to Nearest Neighbor Searching in K-Dimensional Trees Algorithmica, pp. 579-589 1991.
[26] T. Seidl and H. Kriegel, Efficient User-Adaptable Similarity Search in Large Multimedia Databases Proc. Conf. Very Large Databases, 1997.
[27] Y. Sakurai, M. Yoshikawa, S. Uemura, and H. Kojima, The A-Tree: An Index Structure for High-Dimensional Spaces Using Relative Approximation Proc. Conf. Very Large Databases, 2000.
[28] Y. Tao and D. Papadias, Adaptive Index Structures Proc. Conf. Very Large Database, 2002.
[29] Y. Theodoridis and T. Sellis, A Model for the Prediction of R-Tree Performance Proc. ACM Conf. Principles on Database Systems, 1996.
[30] UCI KDD archive,http:/kdd.ics.uci.edu/, 2002.
[31] R. Weber, H. Schek, and S. Blott, A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces Proc. Conf. Very Large Databases, 1998.

Index Terms:
Information storage and retrieval, selection process.
Citation:
Yufei Tao, Jun Zhang, Dimitris Papadias, Nikos Mamoulis, "An Efficient Cost Model for Optimization of Nearest Neighbor Search in Low and Medium Dimensional Spaces," IEEE Transactions on Knowledge and Data Engineering, vol. 16, no. 10, pp. 1169-1184, Oct. 2004, doi:10.1109/TKDE.2004.48
Usage of this product signifies your acceptance of the Terms of Use.