The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.08 - August (2008 vol.30)
pp: 1371-1384
ABSTRACT
This paper introduces a discriminative model for the retrieval of images from text queries. Our approach formalizes the retrieval task as a ranking problem, and introduces a learning procedure optimizing a criterion related to the ranking performance. The proposed model hence addresses the retrieval problem directly and does not rely on an intermediate image annotation task, which contrasts with previous research. Moreover, our learning procedure builds upon recent work on the online learning of kernel-based classifiers. This yields an efficient, scalable algorithm, which can benefit from recent kernels developed for image comparison. The experiments performed over stock photography data show the advantage of our discriminative ranking approach over state-of-the-art alternatives (e.g. our model yields 26.3% average precision over the Corel dataset, which should be compared to 22.0%, for the best alternative model evaluated). Further analysis of the results shows that our model is especially advantageous over difficult queries such as queries with few relevant pictures or multiple-word queries.
INDEX TERMS
image retrieval, ranking, discriminative learning, kernel-based classifier, large margin
CITATION
David Grangier, Samy Bengio, "A Discriminative Kernel-Based Approach to Rank Images from Text Queries", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.30, no. 8, pp. 1371-1384, August 2008, doi:10.1109/TPAMI.2007.70791
REFERENCES
[1] A. Amir, G. Iyengar, J. Argillander, M. Campbell, A. Haubold, S. Ebadollahi, F. Kang, M.R. Naphade, A. Natsev, J.R. Smith, J. Tesic, and T. Volkmer, “IBM Research TRECVID-2005 Video Retrieval System,” Proc. TREC Video Workshop, 2005.
[2] R. Baeza-Yates and B. Ribeiro-Neto, Modern Information Retrieval. Addison-Wesley, 1999.
[3] K. Barnard, P. Duygulu, D. Forsyth, N. de Freitas, D.M. Blei, and M.I. Jordan, “Matching Words and Pictures,” J. Machine Learning Research, vol. 3, 2003.
[4] K. Barnard and D. Forsyth, “Learning the Semantics of Words and Pictures,” Proc. Int'l Conf. Computer Vision (ICCV '01), 2001.
[5] D.M. Blei, A.Y. Ng, and M.I. Jordan, “Latent Dirichlet Allocation,” J. Machine Learning Research, vol. 3, 2003.
[6] S. Boughorbel, J.P. Tarel, and F. Fleuret, “Non-Mercer Kernels for SVM Object Recognition,” Proc. British Machine Vision Conf., 2004.
[7] K. Brinker and E. Huellermeier, “Calibrated Label-Ranking,” Proc. NIPS Workshop Learning to Rank, 2005.
[8] C.J.C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, and G. Hullender, “Learning to Rank Using Gradient Descent,” Proc. Int'l Conf. Machine Learning (ICML '05), 2005.
[9] G. Carneiro and N. Vasconcelos, “Formulating Semantic Image Annotation as a Supervised Learning Problem,” Proc. Conf. Computer Vision and Pattern Recognition (CVPR '05), 2005.
[10] S.-F. Chang, W. Hsu, W. Jiang, L. Kennedy, D. Xu, A. Yanagawa, and E. Zavesky, “Trecvid-2006 Video Search and High-Level Feature Extraction,” Proc. TREC Video Workshop, 2006.
[11] R. Collobert and S. Bengio, “Links between Perceptrons, MLPs and SVMs,” Proc. Int'l Conf. Machine Learning (ICML '04), 2004.
[12] K. Crammer, O. Dekel, J. Keshet, S. Shalev-Shwartz, and Y. Singer, “Online Passive-Aggressive Algorithms,” J. Machine Learning Research, vol. 7, 2006.
[13] F.M.D. Grangier and S. Bengio, “Learning to Retrieve Images from Text Queries with a Discriminative Model,” Proc. Int'l Conf. Adaptive Multimedia Retrieval (AMR '06), 2006.
[14] P. Duygulu, K. Barnard, N. de Freitas, and D. Forsyth, “Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary,” Proc. European Conf. Computer Vision (ECCV '02), 2002.
[15] J. Eichhorn and O. Chapelle, “Object Categorization with SVM: Kernels for Local Features,” Technical Report 137, Max Planck Inst., 2004.
[16] S.L. Feng, V. Lavrenko, and R. Manmatha, “Multiple Bernoulli Relevance Models for Image and Video Annotation,” Proc. Conf. Computer Vision and Pattern Recognition (CVPR '04), 2004.
[17] D. Grangier and S. Bengio, “Exploiting Hyperlinks to Learn a Retrieval Model,” Proc. NIPS Workshop Learning to Rank, 2005.
[18] K. Grauman and T. Darrell, “The Pyramid Match Kernel: Discriminative Classification with Sets of Image Features,” Proc. Int'l Conf. Computer Vision (ICCV '05), 2005.
[19] R. Herbrich, T. Graepel, and K. Obermayer, “Large Margin Rank Boundaries for Ordinal Regression,” Advances in Large Margin Classifiers, P.J. Bartlett, B. Schoelkopf, and D. Schuurmans, eds., MIT Press, 2000.
[20] T. Hofmann, “Unsupervised Learning by Probabilistic Latent Semantic Analysis,” Machine Learning, vol. 42, 2001.
[21] T. Jebara and R. Kondor, “Bhattacharyya and Expected Likelihood Kernels,” Proc. Conf. Learning Theory, 2003.
[22] J. Jeon, V. Lavrenko, and R. Manmatha, “Automatic Image Annotation and Retrieval Using Cross-Media Relevance Models,” Proc. ACM Special Interest Group on Information Retrieval, 2003.
[23] J. Jeon and R. Manmatha, “Using Maximum Entropy for Automatic Image Annotation,” Proc. Int'l Conf. Image and Video Retrieval, 2004.
[24] T. Joachims, “Making Large-Scale Support Vector Machine Learning Practical,” Advances in Kernel Methods: Support Vector Machines, A. Smola, B. Scholkopf, and C. Burges, ed., MIT Press, 1998.
[25] T. Joachims, “Optimizing Search Engines Using Clickthrough Data,” Proc. Int'l Conf. Knowledge Discovery and Data Mining (KDD '02), 2002.
[26] T. Joachims, “A Support Vector Method for Multivariate Performance Measures,” Proc. Int'l Conf. Machine Learning (ICML '05), 2005.
[27] R. Kondor and T. Jebara, “A Kernel Between Bags of Vectors,” Proc. Int'l Conf. Machine Learning (ICML '03), 2003.
[28] V. Lavrenko, M. Choquette, and W.B. Croft, “Cross-Lingual Relevance Models,” Proc. ACM Special Interest Group on Information Retrieval, 2002.
[29] D.G. Lowe, “Distinctive Image Features from Scale-Invariant Keypoints,” Int'l J. Computer Vision, vol. 60, no. 2, 2004.
[30] S. Lyu, “Kernels for Unordered Sets: The Gaussian Mixture Approach,” Proc. European Conf. Machine Learning (ECML '05), 2005.
[31] F. Monay and D. Gatica-Perez, “PLSA-Based Image Auto-Annotation: Constraining the Latent Space,” Proc. ACM Int'l Conf. Multimedia, 2004.
[32] H. Mueller, S. Marchand-Maillet, and T. Pun, “The Truth About Corel: Evaluation in Image Retrieval,” Proc. Int'l Conf. Image and Video Retrieval, 2002.
[33] M.R. Naphade, “On Supervision and Statistical Learning for Semantic Multimedia Analysis,” J. Visual Comm. and Image Representation, vol. 15, no. 3, 2004.
[34] T. Ojala, M. Pietikainen, and T. Maenpaa, “Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns,” Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 7, July 2002.
[35] J.Y. Pan, H.J. Yang, P. Duygulu, and C. Faloutsos, “Automatic Image Captioning,” Proc. Int'l Conf. Multimedia and Expo (ICME '04), 2004.
[36] P. Quelhas, F. Monay, J.M. Odobez, D. Gatica-Perez, T. Tuytelaars, and L.J. Van Gool, “Modeling Scenes with Local Descriptors and Latent Aspects,” Proc. Int'l Conf. Computer Vision (ICCV '05), 2005.
[37] A. Rakotomamonjy, “Optimizing AUC with Support Vector Machine,” Proc. European Conf. Artificial Intelligence Workshop ROC Curve, 2004.
[38] J.A. Rice, Rice, Mathematical Statistics and Data Analysis. Duxbury Press, 1995.
[39] J. Sivic, B.C. Russell, A.A. Efros, A. Zisserman, and W.T. Freeman, “Discovering Objects and Their Location in Images,” Proc. Int'l Conf. Computer Vision (ICCV '05), 2005.
[40] A.F. Smeaton, P. Over, and W. Kraaij, “Evaluation Campaigns and TRECVid,” Proc. ACM Workshop Multimedia Information Retrieval (MIR '05), 2006.
[41] M. Szummer and R.W. Picard, “Indoor-Outdoor Image Classification,” Proc. Workshop Content-Based Access of Image and Video Databases, 1998.
[42] V. Takala, T. Ahonen, and M. Pietikainen, “Block-Based Methods for Image Retrieval Using Local Binary Patterns,” Proc. Scandinavian Conf. Image Analysis (SCIA '05), 2005.
[43] K. Tieu and P. Viola, “Boosting Image Retrieval,” Int'l J. Computer Vision, vol. 56, no. 1, 2004.
[44] A. Vailaya, A. Jain, and H.J. Zhang, “On Image Classification: City versus Landscape,” Proc. Workshop Content-Based Access of Image and Video Libraries, 1998.
[45] V. Vapnik, The Nature of Statistical Learning Theory. Springer, 1995.
[46] J. Vogel and B. Schiele, “Natural Scene Retrieval Based on a Semantic Modeling Step,” Proc. Int'l Conf. Image and Video Retrieval, 2004.
[47] E.M. Voorhees, “Evaluation by Highly Relevant Documents,” Proc. ACM ACM Special Interest Group on Information Retrieval, 2001.
[48] C. Wallraven and B. Caputo, “Recognition with Local Features: The Kernel Recipe,” Proc. Int'l Conf. Computer Vision (ICCV '03), 2003.
33 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool