This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Effective Proximity Retrieval by Ordering Permutations
September 2008 (vol. 30 no. 9)
pp. 1-1
We introduce a new probabilistic proximity search algorithm for range and A"-nearest neighbor (A"-NN) searching in both coordinate and metric spaces. Although there exist solutions for these problems, they boil down to a linear scan when the space is intrinsically high dimensional, as is the case in many pattern recognition tasks. This, for example, renders the A"-NN approach to classification rather slow in large databases. Our novel idea is to predict closeness between elements according to how they order their distances toward a distinguished set of anchor objects. Each element in the space sorts the anchor objects from closest to farthest to it and the similarity between orders turns out to be an excellent predictor of the closeness between the corresponding elements. We present extensive experiments comparing our method against state-of-the-art exact and approximate techniques, both in synthetic and real, metric and nonmetric databases, measuring both CPU time and distance computations. The experiments demonstrate that our technique almost always improves upon the performance of alternative techniques, in some cases by a wide margin.

[1] C. Aggarwal, “Re-Designing Distance Functions and Distance-Based Applications for High Dimensional Data,” ACM SIGMOD, vol. 30, no. 1, pp. 13-18, 2001.
[2] C. Aggarwal, A. Hinneburg, and D. Keim, “On the Surprising Behavior of Distance Metrics in High Dimensional Spaces,” Proc. Eighth Int'l Conf. Database Theory, pp. 420-434, 2001.
[3] A. Arslan and O. Egecioglu, “Efficient Algorithms for Normalized Edit Distance,” J. Discrete Algorithms, vol. 1, no. 1, pp. 3-20, 2000.
[4] S. Arya, D. Mount, N. Netanyahu, R. Silverman, and A. Wu, “An Optimal Algorithm for Approximate Nearest Neighbor Searching in Fixed Dimension,” Proc. Fifth Symp. Discrete Algorithms, pp. 573-583, 1994.
[5] R. Baeza-Yates, W. Cunto, U. Manber, and S. Wu, “Proximity Matching Using Fixed-Queries Trees,” Proc. Fifth Combinatorial Pattern Matching, pp. 198-212, 1994.
[6] R. Baeza-Yates and B. Ribeiro, Modern Information Retrieval. Addison-Wesley, 1999.
[7] J. Bentley, “Multidimensional Binary Search Trees Used for Associative Searching,” Comm. ACM, vol. 18, no. 9, pp. 509-517, 1975.
[8] S. Brin, “Near Neighbor Search in Large Metric Spaces,” Proc. 21st Very Large Databases, pp. 574-584, 1995.
[9] W. Burkhard and R. Keller, “Some Approaches to Best-Match File Searching,” Comm. ACM, vol. 16, no. 4, pp. 230-236, 1973.
[10] B. Bustos and G. Navarro, “Probabilistic Proximity Search Algorithms Based on Compact Partitions,” J. Discrete Algorithms, vol. 2, no. 1, pp. 115-134, 2003.
[11] E. Chávez and K. Figueroa, “Faster Proximity Searching in Metric Data,” Proc. Mexican Int'l Conf. Artificial Intelligence, pp. 222-231, 2004.
[12] E. Chávez, J. Marroquín, and R. Baeza-Yates, “Spaghettis: An Array Based Algorithm for Similarity Queries in Metric Spaces,” Proc. Sixth String Processing and Information Retrieval, 1999.
[13] E. Chávez, J.L. Marroquin, and G. Navarro, “Fixed Queries Array: A Fast and Economical Data Structure for Proximity Searching,” Multimedia Tools and Applications, vol. 14, no. 2, pp. 113-135, 2001.
[14] E. Chávez and G. Navarro, “Probabilistic Proximity Search: Fighting the Curse of Dimensionality in Metric Spaces,” Information Processing Letters, vol. 85, no. 1, pp. 39-46, 2003.
[15] E. Chávez and G. Navarro, “A Compact Space Decomposition for Effective Metric Indexing,” Pattern Recognition Letters, vol. 26, no. 9, pp. 1363-1376, 2005.
[16] E. Chávez, G. Navarro, R. Baeza-Yates, and J. Marroquín, “Proximity Searching in Metric Spaces,” ACM Computing Surveys, vol. 33, no. 3, pp. 273-321, 2001.
[17] P. Ciaccia and M. Patella, “PAC Nearest Neighbor Queries: Approximate and Controlled Search in High-Dimensional and Metric Spaces,” Proc. 16th Int'l Conf. Data Eng., pp. 244-255, 2000.
[18] P. Ciaccia and M. Patella, “Searching in Metric Spaces with User-Defined and Approximate Distances,” ACM Trans. Database Systems, vol. 27, no. 4, pp. 398-437, 2002.
[19] P. Ciaccia, M. Patella, and P. Zezula, “M-Tree: An Efficient Access Method for Similarity Search in Metric Spaces,” Proc. 23rd Conf. Very Large Databases, pp. 426-435, 1997.
[20] K. Clarkson, “Nearest Neighbor Queries in Metric Spaces,” Discrete Computational Geometry, vol. 22, no. 1, pp. 63-93, 1999.
[21] K. Doheryu, R. Adams, and N. Davey, “Non-Euclidean Norms and Data Normalisation,” Proc. 12th European Symp. Artificial Neural Networks, 2004.
[22] R. Duda, P. Hart, and D. Stork, Pattern Classification, second ed. John Wiley & Sons, 1973.
[23] O. Egecioglu, “Parametric Approximation Algorithms for High-Dimensional Euclidean Similarity,” Proc. Conf. Principles and Practice of Knowledge Discovery in Databases, pp. 79-90, 2001.
[24] R. Fagin, R. Kumar, and D. Sivakumar, “Comparing Top k Lists,” SIAM J. Discrete Math., vol. 17, no. 1, pp. 134-160, 2003.
[25] K. Figueroa, E. Chávez, G. Navarro, and R. Paredes, “On the Least Cost for Proximity Searching in Metric Spaces,” Proc. Fifth Workshop Efficient and Experimental Algorithms, pp. 279-290, 2006.
[26] D. Harman, “Overview of the Third Text REtrieval Conf.,” Proc. Third Text REtrieval Conf., NIST Special Publication 500-207, pp. 1-19, 1995.
[27] G. Hjaltason and H. Samet, “Index-Driven Similarity Search in Metric Spaces,” ACM Trans. Database Systems, vol. 28, no. 4, pp.517-580, 2003.
[28] P. Howarth and S. Rüger, “Fractional Distance Measures for Content-Based Image Retrieval,” Proc. 27th European Conf. IR Research, pp. 447-456, 2005.
[29] G.T. Toussaint, “Computational Geometric Problems in Pattern Recognition,” Pattern Recognition Theory and Applications, O.J.Kittler, ed., NATO ASI series, 1981.
[30] I. Kalantari and G. McDonald, “A Data Structure and an Algorithm for the Nearest Point Problem,” IEEE Trans. Software Eng., vol. 9, no. 5, 1983.
[31] A. Marzal and E. Vidal, “Computation of Normalized Edit Distance and Applications,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 15, no. 9, pp. 926-932, Sept. 1993.
[32] L. Micó, J. Oncina, and E. Vidal, “A New Version of the Nearest-Neighbor Approximating and Eliminating Search (AESA) with Linear Preprocessing-Time and Memory Requirements,” Pattern Recognition Letters, vol. 15, pp. 9-17, 1994.
[33] G. Navarro, “Searching in Metric Spaces by Spatial Approximation,” Very Large Databases J., vol. 11, no. 1, pp. 28-46, 2002.
[34] R. Paredes and E. Chávez, “Using the $k\hbox{-}{\rm Nearest}$ Neighbor Graph for Proximity Searching in Metric Spaces,” Proc. 12th String Processing and Information Retrieval, pp. 127-138, 2005.
[35] R. Paredes, E. Chávez, K. Figueroa, and G. Navarro, “Practical Construction of $k\hbox{-}{\rm Nearest}$ Neighbor Graphs in Metric Spaces,” Proc. Fifth Workshop Efficient and Experimental Algorithms, pp. 85-97, 2006.
[36] R. Paredes and G. Navarro, “Optimal Incremental Sorting,” Proc. Eighth Workshop Algorithm Eng. and Experiments, pp. 171-182, 2006.
[37] P. Phillips, H. Wechsler, J. Huang, and P. Rauss, “The FERET Database and Evaluation Procedure for Face Recognition Algorithms,” Image and Vision Computing J., vol. 16, no. 5, pp. 295-306, 1998.
[38] H. Samet, Foundations of Multidimensional and Metric Data Structures. Morgan Kaufmann, 2005.
[39] T. Skopal, “On Fast Non-Metric Similarity Search by Metric Access Methods,” Proc. Int'l Conf. Extending Database Technology, pp. 718-736, 2006.
[40] E. Vidal, “An Algorithm for Finding Nearest Neighbors in (Approximately) Constant Average Time,” Pattern Recognition Letters, vol. 4, pp. 145-157, 1986.
[41] D. White and R. Jain, “Algorithms and Strategies for Similarity Retrieval,” Technical Report VCL-96-101, Visual Computing Laboratory, Univ. of California San Diego, July 1996.
[42] P. Yianilos, “Excluded Middle Vantage Point Forests for Nearest Neighbor Search,” DIMACS Implementation Challenge, Proc. Int'l Workshop Algorithm Eng. and Experimentation, 1999.
[43] P.N. Yianilos, “Locally Lifting the Curse of Dimensionality for Nearest Neighbor Search,” technical report, NEC Research Inst., June 1999.
[44] P. Zezula, G. Amato, V. Dohnal, and M. Batko, “Similarity Search: The Metric Space Approach,” Advances in Database Systems, vol. 32, 2006.

Index Terms:
Extraterrestrial measurements,Pattern recognition,Databases,Computer Society,Feature extraction,Information retrieval,Support vector machines,Support vector machine classification,Neural networks,Sequences,Implementation,Data Structures,Data Storage Representations,Indexing methods,Information Storage and Retrieval,Information Search and Retrieval
Citation:
"Effective Proximity Retrieval by Ordering Permutations," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 30, no. 9, pp. 1-1, Sept. 2008, doi:10.1109/TPAMI.2007.70815
Usage of this product signifies your acceptance of the Terms of Use.