This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Antipole Tree Indexing to Support Range Search and K-Nearest Neighbor Search in Metric Spaces
April 2005 (vol. 17 no. 4)
pp. 535-550
Range and k-nearest neighbor searching are core problems in pattern recognition. Given a database S of objects in a metric space M and a query object q in M, in a range searching problem the goal is to find the objects of S within some threshold distance to q, whereas in a k-nearest neighbor searching problem, the k elements of S closest to q must be produced. These problems can obviously be solved with a linear number of distance calculations, by comparing the query object against every object in the database. However, the goal is to solve such problems much faster. We combine and extend ideas from the M-Tree, the Multivantage Point structure, and the FQ-Tree to create a new structure in the "bisector tree” class, called the Antipole Tree. Bisection is based on the proximity to an "Antipole” pair of elements generated by a suitable linear randomized tournament. The final winners a,b of such a tournament are far enough apart to approximate the diameter of the splitting set. If {\rm{dist}}(a,b) is larger than the chosen cluster diameter threshold, then the cluster is split. The proposed data structure is an indexing scheme suitable for (exact and approximate) best match searching on generic metric spaces. The Antipole Tree outperforms by a factor of approximately two existing structures such as List of Clusters, M-Trees, and others and, in many cases, it achieves better clustering properties.

[1] P. Agarwal, J. Matousek, and S. Suri, “Farthest Neighbors, Maximum Spanning Trees, and Related Problems in Higher Dimensions,” Computational Geometry: Theory and Applications, vol. 1, pp. 189-201, 1991.
[2] C. Aggarwal, J.L. Wolf, P.S. Yu, and M. Epelman, “Using Unbalanced Trees for Indexing Multidimensional Objects,” Knowledge and Information Systems, vol. 1, no. 3, pp. 157-192, 1999.
[3] R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan, “Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications,” Proc. ACM SIGMOD, pp. 94-105, 1998.
[4] R. Baeza-Yates, W. Cunto, U. Manber, and S. Wu, “Proximity Matching Using Fixed-Queries Trees,” Proc. Combinatorial Pattern Matching, Fifth Ann. Symp., pp. 198-212, 1994.
[5] G. Barequet and S. Har-Peled, “Efficiently Approximating the Minimum-Volume Bounding Box of a Point Set in Three Dimensions,” Proc. 10th Ann. ACM-SIAM Symp. Discrete Algorithms, pp. 82-91, 1999.
[6] S. Battiato, D. Cantone, D. Catalano, G. Cincotti, and M. Hofri, “An Efficient Algorithm for the Approximate Median Selection Problem,” Proc. Fourth Italian Conf. Algorithms and Complexity, pp. 226-238, 2000.
[7] S. Berchtold, D.A. Keim, and H.-P. Kriegel, “The X-Tree: An Index Structure for High-Dimensional Data,” Proc. 22nd Int'l Conf. Very Large Databases, pp. 28-39, 1996.
[8] K. Beyer, J. Goldstein, R. Ramakrishnan, and U. Shaft, “When is Nearest Neighbor Meaningful?” Proc. Seventh Int'l Conf. Database Theory, vol. 1540, pp. 217-235, 1999.
[9] T. Bozkaya and M. Ozsoyoglu, “Indexing Large Metric Spaces for Similarity Search Queries,” ACM Trans. Database Systems, vol. 24, no. 3, pp. 361-404, 1999.
[10] S. Brin, “Near Neighbor Search in Large Metric Spaces,” Proc. 21st Int'l Conf. Very Large Data Bases, pp. 574-584, 1995.
[11] W.A. Burkhard and R.M. Keller, “Some Approaches to Best-Match File Searching,” Comm. ACM, vol. 16, no. 4, pp. 230-236, 1973.
[12] B. Bustos and G. Navarro, “Probabilistic Proximity Searching Algorithms Based on Compact Partitions,” Proc. Symp. String Processing and Information Retrieval, pp. 284-297, 2002.
[13] I. Calantari and G. McDonald, “A Data Structure and an Algorithm for the Nearest Point Problem,” IEEE Trans. Software Eng., vol. 9, no. 5, pp. 631-634, 1983.
[14] D. Cantone, G. Cincotti, A. Ferro, and A. Pulvirenti, “An Efficient Algorithm for the 1-Median Problem,” SIAM J. Optimization, to appear.
[15] T.M. Chan, “Approximating the Diameter, Width, Smallest Enclosing Cylinder, and Minimum-Width Annulus,” Int'l J. Computational Geometry and Applications, vol. 12, nos. 1-2, pp. 67-85, 2002.
[16] E. Chávez and G. Navarro, “An Effective Clustering Algorithm to Index High Dimensional Metric Spaces,” Proc. 11th Ann. ACM-SIAM Symp. Discrete Algorithms, pp. 75-86, 2000.
[17] E. Chávez and G. Navarro, “A Probabilistic Spell for the Curse of Dimensionality,” Proc. Third Workshop Algorithm Eng. and Experimentation (ALENEX '01), pp. 147-160, 2001.
[18] E. Chávez, G. Navarro, R. Baeza-Yates, and J. Marroquin, “Searching in Metric Spaces,” ACM Computing Surveys, vol. 33, no. 3, pp. 273-321, 2001.
[19] B. Chazelle, “Computational Geometry: A Retrospective,” Proc. 26th Ann. ACM Symp. Theory of Computing, pp. 75-94, May 1994.
[20] P. Ciaccia and M. Patella, “Bulk Loading the M-Tree,” Proc. Ninth Australasian Database Conf. (ADC), pp. 15-26, 1998.
[21] P. Ciaccia and M. Patella, “PAC Nearest Neighbor Queries: Approximate and Controlled Search in High-Dimensional and Metric Spaces,” Proc. 16th Int'l Conf. Data Eng.,, pp. 244-255, 2000.
[22] P. Ciaccia, M. Patella, and P. Zezula, “M-Tree: An Efficient Access Method for Similarity Search in Metric Spaces,” Proc. 23rd Int'l Conf. Very Large Data Bases, pp. 426-435, 1997.
[23] K. Clarkson, “Nearest Neighbor Queries in Metric Spaces,” Proc. 29th Ann. ACM Symp. Theory of Computing, pp. 609-617, May 1997.
[24] M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, “A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise,” Proc. Second Int'l Conf. Knowledge Discovery in Databases and Data Mining, pp. 226-231, 1996.
[25] T. Feder and D. Greene, “Optimal Algorithms for Approximate Clustering,” Proc. 20th Ann. ACM Symp. Theory of Computing, pp. 434-444, 1988.
[26] A.W.-C. Fu, P.M. Chan, Y.-L. Cheung, and Y. Moon, “Dynamic VP-Tree Indexing for n-Nearest Neighbor Search Given Pair-Wise Distances,” The VLDB J., vol. 9, no. 2, pp. 154-173, 2000.
[27] V. Ganti, R. Ramakrishnan, J. Gehrke, A. Powell, and J. French, “Clustering Large Datasets in Arbitrary Metric Spaces,” Proc. IEEE 15th Int'l Conf. Data Eng., pp. 502-511, 1999.
[28] A. Gersho and R. Gray, Vector Quantization and Signal Compression. Kluwer Academic, 1992.
[29] A. Gionis, P. Indyk, and R. Motwani, “Similarity Search in High Dimensions via Hashing,” Proc. 25th Int'l Conf. Very Large Data Bases, pp. 518-529, 1999.
[30] T. Gonzalez, “Clustering to Minimize the Maximum Intercluster Distance,” Theoretical Computer Science, vol. 38, pp. 293-306, 1985.
[31] T. Gonzalez, “Covering a Set of Points in Multidimensional Space,” Information Processing Letters, vol. 40, pp. 181-188, 1991.
[32] S. Guha, R. Rastogi, and K. Shim, “Cure: An Efficient Clustering Algorithm for Large Databases,” Proc. ACM SIGMOD, pp. 73-84, 1998.
[33] S. Har-Peled, “A Practical Approach for Computing the Diameter of a Point Set,” Proc. 17th Symp. Computational Geometry, pp. 177-186, 2001.
[34] G.R. Hjaltason and H. Samet, “Distance Browsing in Spatial Database,” ACM Trans. Information Systems, vol. 24, no. 2, pp. 265-318, 1999.
[35] D.S. Hochbaum and W. Maass, “Approximation Schemes for Covering and Packing Problems in Image Processing and VLSI,” J. ACM, vol. 32, no. 1, pp. 130-136, 1985.
[36] P. Indyk, “Sublinear Time Algorithms for Metric Space Problems,” Proc. 31st Ann. ACM Symp. Theory of Computing, pp. 428-434, 1999.
[37] P. Indyk and R. Motwani, “Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality,” Proc. 30th Ann. ACM Symp. Theory of Computing, pp. 604-613, 1998.
[38] C. Li, E. Chang, and H.G.-M.G. Wiederhold, “Clustering for Approximate Similarity Search in High-Dimensional Spaces,” IEEE Trans. Knowledge and Data Eng., vol. 14, no. 4, pp. 792-808, July-Aug. 2002.
[39] T.M. Mitchell, Machine Learning. McGraw-Hill, 1997.
[40] G. Navarro, “Searching in Metric Spaces by Spatial Approximation,” The VLDB J., vol. 11, pp. 28-46, 2002.
[41] R. Ng and J. Han, “Clarans: A Method for Clustering Objects for Spatial Data Mining,” IEEE Trans. Knowledge and Data Eng., vol. 14, no. 5, pp. 1003-1016, Sept./Oct. 2002.
[42] H. Noltemeier, K. Verbarg, and C. Zirkelbach, “Monotonous Bisector* Trees— A Tool for Efficient Partitioning of Complex Scenes of Geometric Objects,” Data Structure and Efficient Algorithms, Lecture Notes in Computer Sciences, vol. 594, pp. 186-203, Springer-Verlag, 1992.
[43] C. Procopiuc, “Geometric Techniques for Clustering Theory and Practice,” PhD dissertation, Duke Univ., 2001.
[44] M. Shapiro, “The Choice of Reference Points in Best-Match File Searching,” Comm. ACM, vol. 20, no. 5, pp. 339-343, 1997.
[45] D. Shasha and T.-L. Wang, “New Techniques for Best-Match Retrieval,” ACM Trans. Information Systems, vol. 8, no. 2, pp. 140-158, 1990.
[46] G. Sheikholeslami, S. Chatterjee, and A. Zhang, “WaveCluster: A Wavelet Based Clustering Approach for Spatial Data in Very Large Databases,” The VLDB J., vol. 8, nos. 3-4, pp. 289-304, 2000.
[47] S. Sumanasekara and M. Ramakrishna, “CHILMA: An Efficient High Dimensional Indexing Structure for Image Databases,” Proc. First IEEE Pacific-Rim Conf. Multimedia, pp. 76-79, 2000.
[48] C. TrainaJr., A. Traina, B. Seeger, and C. Faloutsos, “Slim-Trees: High Performance Metric Trees Minimizing Overlap between Nodes,” Proc. Seventh Int'l Conf. Extending Database Technology, vol. 1777, pp. 51-65, 2000.
[49] J. Uhlmann, “Satisfying General Proximity/Similarity Queries with Metric Trees,” Information Processing Letters, vol. 40, pp. 175-179, 1991.
[50] VisTex, http://graphics.stanford. edu/projects/texture/ demosynthesis_VisTex_192.html, Texture Synthesis: VisTex Texture, 2004.
[51] L. Wei and M. Levoy, “Texture Synthesis over Arbitrary Manifold Surfaces,” Proc. ACM-SIGGRAPH '01, pp. 355-360, 2001.
[52] P. Yianilos, “Data Structures and Algorithms for Nearest Neighbor Search in General Metric Saces,” Proc. Third Ann. ACM-SIAM Symp. Discrete Algorithms, pp. 311-321, Jan. 1993.
[53] T. Zhang, R. Ramakrishnan, and M. Livny, “BIRCH: An Efficient Data Clustering Method for Very Large Databases,” Proc. ACM SIGMOD Conf. Management of Data, pp. 103-114, 1996.

Index Terms:
Indexing methods, similarity measures, information search and retrieval.
Citation:
Domenico Cantone, Alfredo Ferro, Alfredo Pulvirenti, Diego Reforgiato Recupero, Dennis Shasha, "Antipole Tree Indexing to Support Range Search and K-Nearest Neighbor Search in Metric Spaces," IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 4, pp. 535-550, April 2005, doi:10.1109/TKDE.2005.53
Usage of this product signifies your acceptance of the Terms of Use.