This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Aggregating Local Image Descriptors into Compact Codes
Sept. 2012 (vol. 34 no. 9)
pp. 1704-1716
H. Jegou, INRIA, Rennes, France
F. Perronnin, Xerox, Meylan, France
M. Douze, INRIA, St. Ismier, France
J. Sanchez, Res. Centre in Inf. for Eng., UTN, Cόrdoba, Argentina
P. Perez, Technicolor, Cesson-Sevigne, France
C. Schmid, INRIA, St. Ismier, France
This paper addresses the problem of large-scale image search. Three constraints have to be taken into account: search accuracy, efficiency, and memory usage. We first present and evaluate different ways of aggregating local image descriptors into a vector and show that the Fisher kernel achieves better performance than the reference bag-of-visual words approach for any given vector dimension. We then jointly optimize dimensionality reduction and indexing in order to obtain a precise vector comparison as well as a compact representation. The evaluation shows that the image representation can be reduced to a few dozen bytes while preserving high accuracy. Searching a 100 million image data set takes about 250 ms on one processor core.

[1] J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman, "Object Retrieval with Large Vocabularies and Fast Spatial Matching," Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2007.
[2] D. Nistér and H. Stewénius, "Scalable Recognition with a Vocabulary Tree," Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, pp. 2161-2168, June 2006.
[3] Z. Wu, Q. Ke, M. Isard, and J. Sun, "Bundling Features for Large Scale Partial-Duplicate Web Image Search," Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, pp. 25-32, 2009.
[4] J. Law-To, L. Chen, A. Joly, I. Laptev, O. Buisson, V. Gouet-Brunet, N. Boujemaa, and F. Stentiford, "Video Copy Detection: A Comparative Study," Proc. ACM Int'l Conf. Image and Video Retrieval, pp. 371-378, 2007.
[5] M. Everingham, L. Van Gool, C.K.I. Williams, J. Winn, and A. Zisserman, "The Pascal Visual Object Classes (VOC) Challenge," Int'l J. Computer Vision, vol. 88, pp. 303-338, June 2010.
[6] J. Sivic and A. Zisserman, "Video Google: A Text Retrieval Approach to Object Matching in Videos," Proc. Ninth IEEE Int'l Conf. Computer Vision, pp. 1470-1477, Oct. 2003.
[7] D. Lowe, "Distinctive Image Features from Scale-Invariant Keypoints," Int'l J. Computer Vision, vol. 60, no. 2, pp. 91-110, 2004.
[8] K. Mikolajczyk and C. Schmid, "A Performance Evaluation of Local Descriptors," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 27, no. 10, pp. 1615-1630, Oct. 2005.
[9] S. Winder and M. Brown, "Learning Local Image Descriptors," Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2007.
[10] S. Winder, G. Hua, and M. Brown, "Picking the Best Daisy," Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2009.
[11] A. Torralba, R. Fergus, and Y. Weiss, "Small Codes and Large Databases for Recognition," Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2008.
[12] H. Jégou, M. Douze, and C. Schmid, "Packing Bag-of-Features," Proc. 12th IEEE Int'l Conf. Computer Vision, Sept. 2009.
[13] O. Chum, M. Perdoch, and J. Matas, "Geometric Min-Hashing: Finding a (Thick) Needle in a Haystack," Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, June 2009.
[14] O. Chum, J. Philbin, and A. Zisserman, "Near Duplicate Image Detection: Min-Hash and tf-idf Weighting," Proc. 19th British Machine Vision Conf., Sept. 2008.
[15] L. Torresani, M. Szummer, and A. Fitzgibbon, "Learning Query-Dependent Prefilters for Scalable Image Retrieval," Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, June 2009.
[16] A. Oliva and A. Torralba, "Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope," Int'l J. Computer Vision, vol. 42, no. 3, pp. 145-175, 2001.
[17] B. Kulis and K. Grauman, "Kernelized Locality-Sensitive Hashing for Scalable Image Search," Proc. 12th IEEE Int'l Conf. Computer Vision, Oct. 2009.
[18] Y. Weiss, A. Torralba, and R. Fergus, "Spectral Hashing," Proc. Conf. Neural Information Processing Systems, 2008.
[19] M. Douze, H. Jégou, H. Singh, L. Amsaleg, and C. Schmid, "Evaluation of GIST Descriptors for Web-Scale Image Search," Proc. ACM Int'l Conf. Image and Video Retrieval, July 2009.
[20] T. Jaakkola and D. Haussler, "Exploiting Generative Models in Discriminative Classifiers," Proc. Conf. Neural Information Processing Systems, 1998.
[21] F. Perronnin and C.R. Dance, "Fisher Kernels on Visual Vocabularies for Image Categorization," Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2007.
[22] F. Perronnin, Y. Liu, J. Sanchez, and H. Poirier, "Large-Scale Image Retrieval with Compressed Fisher Vectors," Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2010.
[23] H. Jégou, M. Douze, C. Schmid, and P. Pérez, "Aggregating Local Descriptors into a Compact Image Representation," Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2010.
[24] H. Jégou, M. Douze, and C. Schmid, "Product Quantization for Nearest Neighbor Search," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 33, no. 1, pp. 117-128, Jan. 2011.
[25] K. Mikolajczyk, T. Tuytelaars, C. Schmid, A. Zisserman, J. Matas, F. Schaffalitzky, T. Kadir, and L.V. Gool, "A Comparison of Affine Region Detectors," Int'l J. Computer Vision, vol. 65, nos. 1/2, pp. 43-72, 2005.
[26] H. Jégou, M. Douze, and C. Schmid, "Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search," Proc. 10th European Conf. Computer Vision, Oct. 2008.
[27] J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman, "Lost in Quantization: Improving Particular Object Retrieval in Large Scale Image Databases," Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, June 2008.
[28] J. van Gemert, C. Veenman, A. Smeulders, and J. Geusebroek, "Visual Word Ambiguity," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 32, no. 7, pp. 1271-1283, July 2010.
[29] H. Jégou, M. Douze, and C. Schmid, "On the Burstiness of Visual Elements," Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2009.
[30] J. Winn, A. Criminisi, and T. Minka, "Object Categorization by Learned Universal Visual Dictionary," Proc. 10th IEEE Int'l Conf. Computer Vision, 2005.
[31] F. Perronnin, J. Sánchez, and Y. Liu, "Large-Scale Image Categorization with Explicit Data Embedding," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2010.
[32] A. Vedaldi and A. Zisserman, "Efficient Additive Kernels via Explicit Feature Maps," Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, 2010.
[33] X. Zhang, Z. Li, L. Zhang, W. Ma, and H.-Y. Shum, "Efficient Indexing for Large-Scale Visual Search," Proc. 12th IEEE Int'l Conf. Computer Vision, Oct. 2009.
[34] C.M. Bishop, Pattern Recognition and Machine Learning. Springer, 2007.
[35] H. Jégou, M. Douze, and C. Schmid, "Improving Bag-of-Features for Large Scale Image Search," Int'l J. Computer Vision, vol. 87, pp. 316-336, Feb. 2010.
[36] M. Datar, N. Immorlica, P. Indyk, and V. Mirrokni, "Locality-Sensitive Hashing Scheme Based on P-Stable Distributions," Proc. Symp. Computational Geometry, pp. 253-262, 2004.
[37] M. Muja and D.G. Lowe, "Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration," Proc. Int'l Conf. Computer Vision Theory and Applications, Feb. 2009.
[38] G. Shakhnarovich, T. Darrell, and P. Indyk, Nearest-Neighbor Methods in Learning and Vision: Theory and Practice, chapter 3. MIT Press, Mar. 2006.

Index Terms:
vectors,image representation,image retrieval,indexing,time 250 ms,local image descriptor aggregation,compact codes,large-scale image search,search accuracy,search efficiency,memory usage,Fisher kernel,reference bag-of-visual words approach,vector dimension,dimensionality reduction optimization,indexing optimization,vector comparison,compact representation,image representation,Vectors,Accuracy,Visualization,Kernel,Indexing,Image representation,indexing.,Image search,image retrieval
Citation:
H. Jegou, F. Perronnin, M. Douze, J. Sanchez, P. Perez, C. Schmid, "Aggregating Local Image Descriptors into Compact Codes," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 9, pp. 1704-1716, Sept. 2012, doi:10.1109/TPAMI.2011.235
Usage of this product signifies your acceptance of the Terms of Use.