This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Indexing the Solution Space: A New Technique for Nearest Neighbor Search in High-Dimensional Space
January/February 2000 (vol. 12 no. 1)
pp. 45-57

Abstract—Similarity search in multimedia databases requires an efficient support of nearest-neighbor search on a large set of high-dimensional points as a basic operation for query processing. As recent theoretical results show, state of the art approaches to nearest-neighbor search are not efficient in higher dimensions. In our new approach, we therefore precompute the result of any nearest-neighbor search which corresponds to a computation of the Voronoi cell of each data point. In a second step, we store conservative approximations of the Voronoi cells in an index structure efficient for high-dimensional data spaces. As a result, nearest neighbor search corresponds to a simple point query on the index structure. Although our technique is based on a precomputation of the solution space, it is dynamic, i.e., it supports insertions of new data points. An extensive experimental evaluation of our technique demonstrates the high efficiency for uniformly distributed as well as real data. We obtained a significant reduction of the search time compared to nearest neighbor search in other index structures such as the X-tree.

[1] S.F. Altschul, W. Gish, W. Miller, E.W. Myers, and D.J. Lipman, “A Basic Local Alignment Search Tool,” J. Molecular Biology, vol. 215, no. 3, pp. 403-410, 1990.
[2] S. Arya, “Nearest Neighbor Searching and Applications,” PhD thesis, Univ. of Maryland, College Park, 1995.
[3] S. Berchtold, C. Böhm, and H.-P. Kriegel, “Improving the Query Performance of High-Dimensional Index Structures Using Bulk-Load Operations,” Proc. Sixth Int'l Conf. Extending Database Technology (EDBT), 1998.
[4] S. Berchtold, C. Böhm, and H.-P. Kriegel, “The Pyramid-Technique: Towards Breaking the Curse of Dimensionality,” Proc. ACM SIGMOD Int'l Conf. Managment of Data, 1998.
[5] S. Berchtold, C. Böhm, and H.-P. Kriegel, “A Cost Model for Nearest Neighbor Search in High-Dimensional Data Spaces,” Proc. 16th ACM SIGACT-SIGMOD-SIGART Symp. Principles of Database Systems (PODS), pp. 78-86, 1997.
[6] B. Becker, P.G. Franciosa, S. Gschwind, T. Ohler, G. Thiemt, and P. Widmayer, “Enclosing Many Boxes by an Optimal Pair of Boxes,” Proc. Ninth Ann. Symp. Thoretical Aspects of Computer Science (STACS '92), pp. 475-486, 1992.
[7] B. Becker, P.G. Franciosa, S. Gschwind, S. Leonardi, T. Ohler, and P. Widmayer, “Enclosing a Set of Objects by Two Minimum Area Rectangles,” J. Algorithms, vol. 21, no. 3, pp. 520-541, 1996.
[8] S. Berchtold, C. Böhm, B. Braunmüller, D. Keim, and H.-P. Kriegel, “Fast Parallel Similarity Search in Multimedia Databases,” Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 1-12, 1997.
[9] S. Berchtold, D. Keim, and H.-P. Kriegel, “The X-Tree: An Index Structure for High-Dimensional Data,” Proc. 22nd Conf. Very Large Data Bases, pp. 28-39, 1996.
[10] T. Brinkhoff, H.-P. Kriegel, and R. Schneider, “Comparison of Approximations of Complex Objects Used for Approximation-Based Query Processing in Spatial Database Systems,” Proc. Ninth Int'l Conf. Data Eng., pp. 40-49, 1993.
[11] N. Beckmann, H.-P. Kriegel, R. Schneider, and B. Seeger, “The R*-Tree: An Efficient and Robust Access Method for Points and Rectangles,” Proc. ACM SIGMOD Conf. Management of Data, 1990.
[12] M.J. Best and K. Ritter, Linear Programming. Active Set Analysis and Computer Programs. Englewood Cliffs, N.J.: Prentice Hall, 1985.
[13] G.B. Dantzig, Linear Programming and Extensions (in German). Berlin: Springer, 1966.
[14] H. Edelsbrunner, Algorithms in Combinatorical Geometry. EATCS Monographs in Computer Science, Berlin: Springer, 1987.
[15] C. Faloutsos, R. Barber, M. Flicker, J. Hafner, W. Niblack, and W. Equitz, "Efficient and effective querying by image content," J. Intell. Information Systems," vol. 3, pp. 231-262, 1994.
[16] G.R. Hjaltason and H. Samet, “Ranking in Spatial Databases,” Proc. Fourth Int'l Symp. Large Spatial Databases, pp. 83-95, 1995.
[17] H.V. Jagadish, “A Retrieval Technique for Similar Shapes,” Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 208-217, 1991.
[18] R. Jain and D.A. White, “Similarity Indexing: Algorithms and Performance,” Proc. SPIE Storage and Retrieval for Image and Video Databases IV, vol. 2670, pp. 62-75, 1996.
[19] H.-P. Kriegel, H. Horn, and M. Schiwietz, “The Performance of Object Decomposition Techniques for Spatial Query Processing,” Proc. Second Symp. Large Spatial Databases, pp. 257-276, 1991.
[20] N. Katayama and S. Satoh, “The SR-Tree: An Index Structure for High-Dimensional Nearest Neighbor Queries,” Proc. SIGMOD, Int'l Conf. Management of Data, pp. 369-380, 1997.
[21] K. Lin, H.V. Jagadish, and C. Faloutsos, “The TV-Tree: An Index Structure for High-Dimensional Data,” VLDB J., vol. 3, pp. 517-542, 1995.
[22] R. Mehrotra and J.E. Gary, “Feature-Based Retrieval of Similar Shapes,” Proc. Ninth Int'l Conf. Data Eng., pp. 108-115, 1993.
[23] A. Papadopoulos and Y. Manolopoulos, “Performance of Nearest Neighbor Queries in R-Trees,” Proc. Sixth Int'l Conf. Database Theory, pp. 394-408, 1997.
[24] F.P. Preparata and M.I. Shamos, Computational Geometry. Springer-Verlag, 1985.
[25] N. Roussopoulos, S. Kelley, and F. Vincent, “Nearest Neighbor Queries,” Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 71-79, 1995.
[26] J.T. Robinson, “The K-D-B-Tree: A Search Structure for Large Multidimensional Dynamic Indexes,” Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 10-18, 1981.
[27] T. Roos, “Dynamic Voronoi Diagrams,” PhD thesis, Univ. of Würzburg, Germany, 1991.
[28] B.K. Shoichet, D.L. Bodian, and I.D. Kuntz, “Molecular Docking Using Shape Descriptors,” J. Computational Chemistry, vol. 13, no. 3, pp. 380-397, 1992.
[29] R. Seidel, “Linear Programming and Convex Hulls Made Easy,” Proc. Sixth Ann. Symp. Computational Geometry, pp. 211-215, 1990.
[30] H. Sawhney and J. Hafner, “Efficient Color Histogram Indexing,” Proc. Int'l Conf. Image Processing, pp. 66-70, 1994.
[31] M. Schiwietz and H.-P. Kriegel, “Query Processing of Spatial Objects: Complexity versus Redundancy,” Proc. Third Int'l Symp. Large Spatial Databases, pp. 377-396, 1993.
[32] T. Seidl and H.-P. Kriegel, “Optimal Multi-Step k-Nearest Neighbor Search,” Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 154-165, 1998.
[33] D. White and R. Jain, “Similarity Indexing with the SS-Tree,” Proc. 12th Int'l Conf. Data Eng., 1996.

Index Terms:
Nearest neighbor search, high-dimensional indexing, efficient query processing, spatial databases, Voronoi diagrams.
Citation:
Stefan Berchtold, Daniel A. Keim, Hans-Peter Kriegel, Thomas Seidl, "Indexing the Solution Space: A New Technique for Nearest Neighbor Search in High-Dimensional Space," IEEE Transactions on Knowledge and Data Engineering, vol. 12, no. 1, pp. 45-57, Jan.-Feb. 2000, doi:10.1109/69.842249
Usage of this product signifies your acceptance of the Terms of Use.