Subscribe
Issue No.02 - February (2008 vol.30)
pp: 243-252
ABSTRACT
Similarity searching often reduces to finding the k nearest neighbors to a query object. Finding the k nearest neighbors is achieved by applying either a depth- first or a best-first algorithm to the search hierarchy containing the data. These algorithms are generally applicable to any index based on hierarchical clustering. The idea is that the data is partitioned into clusters which are aggregated to form other clusters, with the total aggregation being represented as a tree. These algorithms have traditionally used a lower bound corresponding to the minimum distance at which a nearest neighbor can be found (termed MinDist) to prune the search process by avoiding the processing of some of the clusters as well as individual objects when they can be shown to be farther from the query object q than all of the current k nearest neighbors of q. An alternative pruning technique that uses an upper bound corresponding to the maximum possible distance at which a nearest neighbor is guaranteed to be found (termed MaxNearestDist) is described. The MaxNearestDist upper bound is adapted to enable its use for finding the k nearest neighbors instead of just the nearest neighbor (i.e., k=1) as in its previous uses. Both the depth-first and best-first k-nearest neighbor algorithms are modified to use MaxNearestDist, which is shown to enhance both algorithms by overcoming their shortcomings. In particular, for the depth-first algorithm, the number of clusters in the search hierarchy that must be examined is not increased thereby potentially lowering its execution time, while for the best-first algorithm, the number of clusters in the search hierarchy that must be retained in the priority queue used to control the ordering of processing of the clusters is also not increased, thereby potentially lowering its storage requirements.
INDEX TERMS
k-nearest neighbors, similarity searching, metric spaces, depth-first nearest neighbor finding, best-first nearest neighbor finding
CITATION
Hanan Samet, "K-Nearest Neighbor Finding Using MaxNearestDist", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.30, no. 2, pp. 243-252, February 2008, doi:10.1109/TPAMI.2007.1182
REFERENCES
 [1] S. Arya, D.M. Mount, N.S. Netanyahu, R. Silverman, and A.Y. Wu, “An Optimal Algorithm for Approximate Nearest Neighbor Searching in Fixed Dimensions,” J. ACM, vol. 45, no. 6, pp. 891-923, Nov. 1998. [2] N. Beckmann, H.-P. Kriegel, R. Schneider, and B. Seeger, “The ${\rm R}^{\ast}$ -Tree: An Efficient and Robust Access Method for Points and Rectangles,” Proc. ACM SIGMOD Conf., pp. 322-331, June 1990. [3] R.E. Bellman, Adaptive Control Processes. Princeton Univ. Press, 1961. [4] S. Berchtold, C. Böhm, D.A. Keim, and H.-P. Kriegel, “A Cost Model for Nearest Neighbor Search in High-Dimensional Data Space,” Proc. 16th ACM SIGACT-SIGMOD-SIGART Symp. Principles of Database Systems, pp. 78-86, May 1997. [5] S. Berchtold, C. Böhm, and H.-P. Kriegel, “Improving the Query Performance of High-Dimensional Index Structures by Bulk-Load Operations,” Advances in Database Technology-Proc. First Int'l Conf. Extending Database Technology, H.-J Schek, F. Saltor, I. Ramos, and G. Alonso, eds., Springer-Verlag, Lecture Notes in Computer Science, vol. 1377, pp. 216-230, Mar. 1998. [6] K.S. Beyer, J. Goldstein, R. Ramakrishnan, and U. Shaft, “When Is ‘Nearest Neighbor’ Meaningful?” Proc. Seventh Int'l Conf. Database Theory, C. Beeri and P. Buneman, eds., Springer-Verlag, Lecture Notes in Computer Science, vol. 1540, pp.217-235, Jan. 1999. [7] C. Böhm, S. Berchtold, and D.A. Keim, “Searching in High-Dimensional Spaces: Index Structures for Improving the Performance of Multimedia Databases,” ACM Computing Surveys, vol. 33, no. 3, pp. 322-373, Sept. 2001. [8] A.J. Broder, “Strategies for Efficient Incremental Nearest Neighbor Search,” Pattern Recognition, vol. 23, nos. 1-2, pp. 171-178, Jan. 1990. [9] B. Bustos and G. Navarro, “Probabilistic Proximity Searching Algorithms Based on Compact Partitions,” J. Discrete Algorithms, vol. 2, no. 1, pp. 115-134, Mar. 2004. [10] E. Chávez, G. Navarro, R. Baeza-Yates, and J. Marroquín, “Searching in Metric Spaces,” ACM Computing Surveys, vol. 33, no. 3, pp. 273-322, Sept. 2001. [11] P. Ciaccia and M. Patella, “PAC Nearest Neighbor Queries: Approximate and Controlled Search in High-Dimensional and Metric Spaces,” Proc. 16th IEEE Int'l Conf. Data Eng., pp. 244-255, Feb. 2000. [12] P. Ciaccia, M. Patella, and P. Zezula, “M-Tree: An Efficient Access Method for Similarity Search in Metric Spaces,” Proc. 23rd Int'l Conf. Very Large Data Bases, M. Jarke, M.J. Carey, K.R. Dittrich, F.H. Lochovsky, P. Loucopoulos, and M.A. Jeusfeld, eds., pp. 426-435, Aug. 1997. [13] H. Ferhatosmanoglu, I. Stanoi, D. Agrawal, and A. El Abbadi, “Constrained Nearest Neighbor Queries,” Advances in Spatial and Temporal Databases-Proc. Seventh Int'l Symp., C.S. Jensen, M.Schneider, B. Seeger, and V.J. Tsotras, eds., Springer-Verlag, Lecture Notes in Computer Science, vol. 2121, pp.257-278, July 2001. [14] K. Fukunaga and P.M. Narendra, “A Branch and Bound Algorithm for Computing $k$ -Nearest Neighbors,” IEEE Trans. Computers, vol. 24, no. 7, pp. 750-753, July 1975. [15] V. Gaede and O. Günther, “Multidimensional Access Methods,” ACM Computing Surveys, vol. 20, no. 2, pp. 170-231, June 1998. [16] A. Henrich, “A Distance-Scan Algorithm for Spatial Access Structures,” Proc. Second ACM Workshop Geographic Information Systems, N. Pissinou and K. Makki, eds., pp. 136-143, Dec. 1994. [17] G.R. Hjaltason and H. Samet, “Ranking in Spatial Databases,” Advances in Spatial Databases—Proc. Fourth Int'l Symp., M.J.Egenhofer and J.R. Herring, eds., Springer-Verlag, Lecture Notes in Computer Science, vol. 951, pp. 83-95, Aug. 1995. [18] G.R. Hjaltason and H. Samet, “Distance Browsing in Spatial Databases,” ACM Trans. Database Systems, vol. 24, no. 2, pp. 265-318, June 1999. [19] G.R. Hjaltason and H. Samet, “Incremental Similarity Search in Multimedia Databases,” Computer Science Technical Report TR-4199, Univ. of Maryland, Nov. 2000. [20] G.R. Hjaltason and H. Samet, “Index-Driven Similarity Search in Metric Spaces,” ACM Trans. Database Systems, vol. 28, no. 4, pp.517-580, Dec. 2003. [21] B. Kamgar-Parsi and L.N. Kanal, “An Improved Branch and Bound Algorithm for Computing $k$ -Nearest Neighbors,” Pattern Recognition Letters, vol. 3, no. 1, pp. 7-12, Jan. 1985. [22] S. Larsen and L.N. Kanal, “Analysis of k-Nearest Neighbor Branch and Bound Rules,” Pattern Recognition Letters, vol. 4, no. 2, pp. 71-77, Apr. 1986. [23] C. Merkwirth, U. Parlitz, and W. Lauterborn, “Fast Exact and Approximate Nearest Neighbor Searching for Nonlinear Signal Processing,” Physical Rev. E (Statistical Physics, Plasmas, Fluids, and Related Interdisciplinary Topics), vol. 62, no. 2, pp. 2089-2097, Aug. 2000. [24] G. Navarro, “Searching in Metric Spaces by Spatial Approximation,” VLDB J., vol. 11, no. 1, pp. 28-46, Aug. 2002. [25] N. Roussopoulos, S. Kelley, and F. Vincent, “Nearest Neighbor Queries,” Proc. ACM Sigmod Conf., pp. 71-79, May 1995. [26] H. Samet, Foundations of Multidimensional and Metric Data Structures. Morgan-Kaufmann, 2006. [27] H. Samet and R.E. Webber, “Storing a Collection of Polygons Using Quadtrees,” ACM Trans. Graphics, vol. 4, no. 3, pp. 182-222, July 1985. [28] M. Stonebraker, T. Sellis, and E. Hanson, “An Analysis of Rule Indexing Implementations in Data Base Systems,” Proc. First Int'l Conf. Expert Database Systems, pp. 353-364, Apr. 1986. [29] D.A. White and R. Jain, “Similarity Indexing with the SS-Tree,” Proc. 12th IEEE Int'l Conf. Data Eng., S.Y.W. Su, ed., pp. 516-523, Feb. 1996. [30] P. Zezula, G. Amato, V. Dohnal, and M. Batko, Similarity Search: The Metric Space Approach. Springer, 2006.