The Community for Technology Leaders
RSS Icon
Issue No.02 - February (2012 vol.34)
pp: 402-409
D. Gorisse , Yakaz Lab., Paris, France
M. Cord , LIP6, Sorbonne Univ., Paris, France
F. Precioso , Lab. d'lnf., Signaux et Syst. (I3S), Univ. de Nice - Sophia Antipolis, Sophia Antipolis, France
In the past 10 years, new powerful algorithms based on efficient data structures have been proposed to solve the problem of Nearest Neighbors search (or Approximate Nearest Neighbors search). If the Euclidean Locality Sensitive Hashing algorithm, which provides approximate nearest neighbors in a euclidean space with sublinear complexity, is probably the most popular, the euclidean metric does not always provide as accurate and as relevant results when considering similarity measure as the Earth-Mover Distance and χ2 distances. In this paper, we present a new LSH scheme adapted to χ2 distance for approximate nearest neighbors search in high-dimensional spaces. We define the specific hashing functions, we prove their local-sensitivity, and compare, through experiments, our method with the Euclidean Locality Sensitive Hashing algorithm in the context of image retrieval on real image databases. The results prove the relevance of such a new LSH scheme either providing far better accuracy in the context of image retrieval than euclidean scheme for an equivalent speed, or providing an equivalent accuracy but with a high gain in terms of processing speed.
visual databases, approximation theory, data structures, image retrieval, pattern classification, image databases, Chi2 distance, data structures, nearest neighbors search, Euclidean locality sensitive hashing algorithm, nearest neighbors approximation, euclidean space, euclidean metric, earth mover distance, image retrieval, Information retrieval, Databases, Measurement, Approximation algorithms, Histograms, Approximation methods, Semantics, image retrieval., Sublinear algorithm, approximate nearest neighbors, locality sensitive hashing, chi2 distance
D. Gorisse, M. Cord, F. Precioso, "Locality-Sensitive Hashing for Chi2 Distance", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.34, no. 2, pp. 402-409, February 2012, doi:10.1109/TPAMI.2011.193
[1] O. Chapelle, P. Haffner, and V. Vapnik, "Support Vector Machines for Histogram-Based Image Classification," IEEE Trans. Neural Networks, vol. 10, no. 5, pp. 1055-1064, Sept. 1999.
[2] P. Gosselin, M. Cord, and S. Philipp-Foliguet, "Combining Visual Dictionary, Kernel-Based Similarity and Learning Strategy for Image Category Retrieval," Computer Vision and Image Understanding, vol. 110, no. 3, pp. 403-417, 2008.
[3] Y. Rubner, C. Tomasi, and L. Guibas, "The Earth Mover's Distance as a Metric for Image Retrieval," Int'l J. Computer Vision, vol. 40, no. 2, pp. 99-121, 2000.
[4] A. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain, "Content-Based Image Retrieval at the End of the Early Years," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 12, pp. 1349-1380, Dec. 2000.
[5] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning. Springer, 2009.
[6] H. Samet, Foundations of Multidimensional and Metric Data Structures. Morgan Kaufmann, 2006.
[7] H. Jégou, M. Douze, and C. Schmid, "Product Quantization for Nearest Neighbor Search," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 33, no. 1, pp. 117-128, Jan. 2011.
[8] H. Lejsek, F. Ásmundsson, B. Jónsson, and L. Amsaleg, "NV-Tree: An Efficient Disk-Based Index for Approximate Search in Very Large High-Dimensional Collections," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 31, no. 5, pp. 869-883, May 2009.
[9] M. Datar, N. Immorlica, P. Indyk, and V. Mirrokni, "Locality-Sensitive Hashing Scheme Based on p-Stable Distributions," Proc. 20th Ann. Symp. Computational Geometry, pp. 253-262, 2004.
[10] P. Indyk and R. Motwani, "Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality," Proc. 30th Ann. ACM Symp. Theory of Computing, pp. 604-613, 1998.
[11] Y. Ke, R. Sukthankar, and L. Huston, "Efficient Near-Duplicate Detection and Sub-Image Retrieval," Proc. ACM Int'l Conf. Multimedia, pp. 869-876, 2004.
[12] D. Gorisse, M. Cord, F. Precioso, and S. Philipp-Foliguet, "Fast Approximate Kernel-Based Similarity Search for Image Retrieval Task," Proc. 19th Int'l Conf. Pattern Recognition, pp. 1873-1876, 2008.
[13] A. Andoni and P. Indyk, "Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions," Proc. 47th Ann. IEEE Symp. Foundations of Computer Science, pp. 459-468, 2006.
[14] P. Indyk and N. Thaper, "Fast Image Retrieval via Embeddings," Proc. Int'l Workshop Statistical and Computational Theories of Vision, 2003.
[15] M. Muja and D.G. Lowe, "Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration," Proc. Int'l Conf. Computer Vision Theory and Applications, 2009.
[16] B. Georgescu, I. Shimshoni, and P. Meer, "Mean Shift Based Clustering in High Dimensions: A Texture Classification Example," Proc. IEEE Int'l Conf. Computer Vision, 2003.
[17] Y. Weiss, A. Torralba, and R. Fergus, "Spectral Hashing," Proc. Advances in Neural Information Processing Systems, 2008.
[18] Q. Lv, W. Josephson, Z. Wang, M. Charikar, and K. Li, "Multi-Probe lsh: Efficient Indexing for High-Dimensional Similarity Search," Proc. 33rd Int'l Conf. Very Large Data Bases, pp. 950-961, 2007.
[19] D. Gorisse, M. Cord, and F. Precioso, "Scalable Active Learning Strategy for Object Category Retrieval," Proc. 17th IEEE Int'l Conf. Image Processing, 2010.
[20] M. Greenacre, Correspondence Analysis in Practice, second ed. Chapman & Hall CRC, 2007.
[21] E. Chang, S. Tong, K. Goh, and C. Chang, "Support Vector Machine Concept-Dependent Active Learning for Image Retrieval," IEEE Trans. Multimedia, vol. 2, 2005.
31 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool