The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.01 - January (2008 vol.30)
pp: 89-104
ABSTRACT
This paper describes BoostMap, a method for efficient nearest neighbor retrieval under computationally expensive distance measures. Database and query objects are embedded into a vector space, in which distances can be measured efficiently. Each embedding is treated as a classifier that predicts for any three objects X, A, B whether X is closer to A or to B. It is shown that a linear combination of such embeddingbased classifiers naturally corresponds to an embedding and a distance measure. Based on this property, the BoostMap method reduces the problem of embedding construction to the classical boosting problem of combining many weak classifiers into an optimized strong classifier. The classification accuracy of the resulting strong classifier is a direct measure of the amount of nearest neighbor structure preserved by the embedding. An important property of BoostMap is that the embedding optimization criterion is equally valid in both metric and non-metric spaces. Performance is evaluated in databases of hand images, handwritten digits, and time series. In all cases, BoostMap significantly improves retrieval efficiency with small losses in accuracy compared to brute-force search. Moreover, BoostMap significantly outperforms existing nearest neighbor retrieval methods, such as Lipschitz embeddings, FastMap, and VP-trees.
INDEX TERMS
Indexing methods, embedding methods, similarity matching, multimedia databases, nearest neighbor retrieval, nearest neighbor classification, non-Euclidean spaces
CITATION
Vassilis Athitsos, Jonathan Alon, Stan Sclaroff, George Kollios, "BoostMap: An Embedding Method for Efficient Nearest Neighbor Retrieval", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.30, no. 1, pp. 89-104, January 2008, doi:10.1109/TPAMI.2007.1140
REFERENCES
[1] T.M. Cover and J.A. Thomas, Elements of Information Theory. Wiley-Interscience, 1991.
[2] J.B. Kruskall and M. Liberman, “The Symmetric Time Warping Algorithm: From Continuous to Discrete,” Time Warps, Addison-Wesley, 1983.
[3] E. Keogh, “Exact Indexing of Dynamic Time Warping,” Proc. Int'l Conf. Very Large Data Bases, pp. 406-417, 2002.
[4] V.I. Levenshtein, “Binary Codes Capable of Correcting Deletions, Insertions, and Reversals,” Soviet Physics, vol. 10, no. 8, pp. 707-710, 1966.
[5] G. Hjaltason and H. Samet, “Properties of Embedding Methods for Similarity Searching in Metric Spaces,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 25, no. 5, pp. 530-549, May 2003.
[6] R. Schapire and Y. Singer, “Improved Boosting Algorithms Using Confidence-Rated Predictions,” Machine Learning, vol. 37, no. 3, pp. 297-336, 1999.
[7] H. Barrow, J. Tenenbaum, R. Bolles, and H. Wolf, “Parametric Correspondence and Chamfer Matching: Two New Techniques for Image Matching,” Proc. Int'l Joint Conf. Artificial Intelligence, pp.659-663, 1977.
[8] S. Belongie, J. Malik, and J. Puzicha, “Shape Matching and Object Recognition Using Shape Contexts,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 4, pp. 509-522, Apr. 2002.
[9] C. Faloutsos and K.I. Lin, “FastMap: A Fast Algorithm for Indexing, Data-Mining and Visualization of Traditional and Multimedia Datasets,” Proc. ACM Int'l Conf. Management of Data pp. 163-174, 1995.
[10] P. Yianilos, “Data Structures and Algorithms for Nearest Neighbor Search in General Metric Spaces,” Proc. ACM-SIAM Symp. Discrete Algorithms, pp. 311-321, 1993.
[11] C. Böhm, S. Berchtold, and D.A. Keim, “Searching in High-Dimensional Spaces: Index Structures for Improving the Performance of Multimedia Databases,” ACM Computing Surveys, vol. 33, no. 3, pp. 322-373, 2001.
[12] G.R. Hjaltason and H. Samet, “Index-Driven Similarity Search in Metric Spaces,” ACM Trans. Database Systems, vol. 28, no. 4, pp.517-580, 2003.
[13] D.A. White and R. Jain, “Similarity Indexing: Algorithms and Performance,” Proc. Storage and Retrieval for Image and Video Databases, pp. 62-73, 1996.
[14] R. Weber, H.-J. Schek, and S. Blott, “A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces,” Proc. Int'l Conf. Very Large Data Bases, pp.194-205, 1998.
[15] Y. Sakurai, M. Yoshikawa, S. Uemura, and H. Kojima, “The A-Tree: An Index Structure for High-Dimensional Spaces Using Relative Approximation,” Proc. Int'l Conf. Very Large Data Bases, pp. 516-526, 2000.
[16] K. Chakrabarti and S. Mehrotra, “Local Dimensionality Reduction: A New Approach to Indexing High Dimensional Spaces,” Proc. Int'l Conf. Very Large Data Bases, pp. 89-100, 2000.
[17] C. Li, E. Chang, H. Garcia-Molina, and G. Wiederhold, “Clustering for Approximate Similarity Search in High-Dimensional Spaces,” IEEE Trans. Knowledge and Data Eng., vol. 14, no. 4, pp.792-808, Nov./Dec. 2002.
[18] Ö. Egecioglu and H. Ferhatosmanoglu, “Dimensionality Reduction and Similarity Distance Computation by Inner Product Approximations,” Proc. Int'l Conf. Information and Knowledge Management, pp. 219-226, 2000.
[19] K.V.R. Kanth, D. Agrawal, and A. Singh, “Dimensionality Reduction for Similarity Searching in Dynamic Databases,” Proc. ACM Int'l Conf. Management of Data, pp. 166-176, 1998.
[20] R. Weber and K. Böhm, “Trading Quality for Time with Nearest-Neighbor Search,” Proc. Int'l Conf. Extending Database Technology: Advances in Database Technology, pp. 21-35, 2000.
[21] N. Koudas, B.C. Ooi, H.T. Shen, and A.K.H. Tung, “LDC: Enabling Search by Partial Distance in a Hyper-Dimensional Space,” Proc. IEEE Int'l Conf. Data Eng., pp. 6-17, 2004.
[22] E. Tuncel, H. Ferhatosmanoglu, and K. Rose, “VQ-Index: An Index Structure for Similarity Searching in Multimedia Databases,” Proc. ACM Multimedia, pp. 543-552, 2002.
[23] A. Gionis, P. Indyk, and R. Motwani, “Similarity Search in High Dimensions via Hashing,” Proc. Int'l Conf. Very Large Databases, pp. 518-529, 1999.
[24] A. Frome, D. Huber, R. Kolluri, T. Bulow, and J. Malik, “Recognizing Objects in Range Data Using Regional Point Descriptors,” Proc. European Conf. Computer Vision, vol. 3, pp.224-237, 2004.
[25] K. Grauman and T.J. Darrell, “Fast Contour Matching Using Approximate Earth Mover's Distance,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. I: 220-I: 227, 2004.
[26] G. Shakhnarovich, P. Viola, and T. Darrell, “Fast Pose Estimation with Parameter-Sensitive Hashing,” Proc. IEEE Int'l Conf. Computer Vision, pp. 750-757, 2003.
[27] D. Huttenlocher, D. Klanderman, and A. Rucklige, “Comparing Images Using the Hausdorff Distance,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 15, no. 9, pp. 850-863, Sept. 1993.
[28] H.W. Kuhn, “The Hungarian Method for the Assignment Problem,” Naval Research Logistics Quarterly, vol. 2, pp. 83-87, 1955.
[29] J. Uhlman, “Satisfying General Proximity/Similarity Queries with Metric Trees,” Information Processing Letters, vol. 40, no. 4, pp. 175-179, 1991.
[30] T. Bozkaya and Z. Özsoyoglu, “Indexing Large Metric Spaces for Similarity Search Queries,” ACM Trans. Database Systems, vol. 24, no. 3, pp. 361-404, 1999.
[31] P. Ciaccia, M. Patella, and P. Zezula, “M-Tree: An Efficient Access Method for Similarity Search in Metric Spaces,” Proc. Int'l Conf. Very Large Data Bases, pp. 426-435, 1997.
[32] C. Traina, Jr., A. Traina, B. Seeger, and C. Faloutsos, “Slim-Trees: High Performance Metric Trees Minimizing Overlap between Nodes,” Proc. Seventh Int'l Conf. Extending Database Technology, pp.51-65, 2000.
[33] P. Zezula, P. Savino, G. Amato, and F. Rabitti, “Approximate Similarity Retrieval with M-Trees,” VLDB J., vol. 4, pp. 275-293, 1998.
[34] E. Vidal, “New Formulation and Improvements of the Nearest-Neighbour Approximating and Eliminating Search Algorithm (AESA),” Pattern Recognition Letters, vol. 15, no. 1, pp. 1-7, 1994.
[35] L. Micó and E. Vidal, “A New Version of the Nearest-Neighbour Approximating and Eliminating Search Algorithm (AESA) with Linear Preprocessing Time and Memory Requirements,” Pattern Recognition Letters, vol. 15, no. 1, pp. 9-17, 1994.
[36] J. Bourgain, “On Lipschitz Embeddings of Finite Metric Spaces in Hilbert Space,” Israel J. Math., vol. 52, pp. 46-52, 1985.
[37] G. Hristescu and M. Farach-Colton, “Cluster-Preserving Embedding of Proteins,” Technical Report 99-50, Computer Science Dept., Rutgers Univ., 1999.
[38] S. Roweis and L. Saul, “Nonlinear Dimensionality Reduction by Locally Linear Embedding,” Science, vol. 290, pp. 2323-2326, 2000.
[39] J. Tenenbaum, V.D. Silva, and J. Langford, “A Global Geometric Framework for Nonlinear Dimensionality Reduction,” Science, vol. 290, pp. 2319-2323, 2000.
[40] X. Wang, J.T.L. Wang, K.I. Lin, D. Shasha, B.A. Shapiro, and K. Zhang, “An Index Structure for Data Mining and Clustering,” Knowledge and Information Systems, vol. 2, no. 2, pp. 161-184, 2000.
[41] F. Young and R. Hamer, Multidimensional Scaling: History, Theory and Applications. Lawrence Erlbaum Assoc., 1987.
[42] N. Linial, E. London, and Y. Rabinovich, “The Geometry of Graphs and Some of Its Algorithmic Applications,” Proc. IEEE Symp. Foundations of Computer Science, pp. 577-591, 1994.
[43] V. Athitsos, J. Alon, S. Sclaroff, and G. Kollios, “BoostMap: A Method for Efficient Approximate Similarity Rankings,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 268-275, 2004.
[44] M. Vlachos, M. Hadjieleftheriou, D. Gunopulos, and E. Keogh, “Indexing Multi-Dimensional Time-Series with Support for Multiple Distance Measures,” Proc. ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 216-225, 2003.
[45] B.-K. Yi, H.V. Jagadish, and C. Faloutsos, “Efficient Retrieval of Similar Time Sequences under Time Warping,” Proc. IEEE Int'l Conf. Data Eng., pp. 201-208, 1998.
[46] G. Mori, S. Belongie, and J. Malik, “Shape Contexts Enable Efficient Retrieval of Similar Shapes,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, vol. 1, pp. 723-730, 2001.
[47] H. Zhang and J. Malik, “Learning a Discriminative Classifier Using Shape Context Distances,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, vol. 1, pp. 242-247, 2003.
[48] C. Bahlmann and H. Burkhardt, “The Writer Independent Online Handwriting Recognition System Frog on Hand and Cluster Generative Statistical Dynamic Time Warping,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 26, no. 3, pp. 299-310, Mar. 2004.
[49] V.S. Devi and M.N. Murty, “An Incremental Prototype Set Building Technique,” Pattern Recognition, vol. 35, no. 2, pp. 505-513, 2002.
[50] G.W. Gates, “The Reduced Nearest Neighbor Rule,” IEEE Trans. Information Theory, vol. 18, no. 3, pp. 431-433, 1972.
[51] P.E. Hart, “The Condensed Nearest Neighbor Rule,” IEEE Trans. Information Theory, vol. 14, no. 3, pp. 515-516, 1968.
[52] T. Liu, K. Yang, and A.W. Moore, “The IOC Algorithm: Efficient Many-Class Non-Parametric Classification for High-Dimensional Data,” Proc. ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 629-634, 2004.
[53] T. Liu, A.W. Moore, and A.G. Gray, “Efficient Exact k-nn and Nonparametric Classification in High Dimensions,” Neural Information Processing Systems, 2003.
[54] V. Athitsos, “Learning Embeddings for Indexing, Retrieval and Classification, with Applications to Object and Shape Recognition in Image Databases,” PhD dissertation, Boston Univ., 2006.
[55] J. Friedman, T. Hastie, and R. Tibshirani, “Additive Logistic Regression: A Statistical View of Boosting,” Annals of Statistics, vol. 28, no. 2, pp. 337-374, 2000.
[56] J. Alon, V. Athitsos, and S. Sclaroff, “Online and Offline Character Recognition Using Alignment to Prototypes,” Proc. Int'l Conf. Document Analysis and Recognition, pp. 839-843, 2005.
[57] V. Athitsos, M. Hadjieleftheriou, G. Kollios, and S. Sclaroff, “Query-Sensitive Embeddings,” Proc. ACM Int'l Conf. Management of Data, pp. 706-717, 2005.
[58] V. Athitsos and S. Sclaroff, “Boosting Nearest Neighbor Classifiers for Multiclass Recognition,” Proc. IEEE Workshop Learning in Computer Vision and Pattern Recognition, 2005.
[59] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-Based Learning Applied to Document Recognition,” Proc. IEEE, vol. 86, no. 11, pp. 2278-2324, 1998.
[60] I. Guyon, L. Schomaker, and R. Plamondon, “Unipen Project of On-Line Data Exchange and Recognizer Benchmarks,” Proc. 12th Int'l Conf. Pattern Recognition, pp. 29-33, 1994.
[61] Poser 5 Reference Manual. Curious Labs, Aug. 2002.
[62] Q. Yuan, S. Sclaroff, and V. Athitsos, “Automatic 2D Hand Tracking in Video Sequences,” Proc. IEEE Workshop Applications of Computer Vision, pp. 250-256, 2005.
[63] V. Athitsos and S. Sclaroff, “Estimating Hand Pose from a Cluttered Image,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, vol. 2, pp. 432-439, 2003.
[64] J. Canny, “A Computational Approach to Edge Detection,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 8, no. 6, pp.679-698, 1986.
[65] S.C. Sahinalp, M. Tasan, J. Macker, and Z.M. Özsoyoglu, “Distance Based Indexing for String Proximity Search,” Proc. IEEE Int'l Conf. Data Eng., pp. 125-136, 2003.
[66] V. Athitsos, J. Alon, and S. Sclaroff, “Efficient Nearest Neighbor Classification Using a Cascade of Approximate Similarity Measures,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 486-493, 2005.
[67] S.Z. Li and Z.Q. Zhang, “Floatboost Learning and Statistical Face Detection,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 26, no. 9, pp. 1112-1123, Sept. 2004.
12 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool