This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Modeling Semantic Aspects for Cross-Media Image Indexing
October 2007 (vol. 29 no. 10)
pp. 1802-1817
To go beyond the query-by-example paradigm in image retrieval, there is a need for semantic indexing of large image collections for intuitive text-based image search. Different models have been proposed to learn the dependencies between the visual content of an image set and the associated text captions, then allowing for the automatic creation of semantic indices for unannotated images. The task, however, remains unsolved. In this paper, we present three alternatives to learn a Probabilistic Latent Semantic Analysis model (PLSA) for annotated images, and evaluate their respective performance for automatic image indexing. Under the PLSA assumptions, an image is modeled as a mixture of latent aspects that generates both image features and text captions, and we investigate three ways to learn the mixture of aspects. We also propose a more discriminative image representation than the traditional Blob histogram, concatenating quantized local color information and quantized local texture descriptors. The first learning procedure of a PLSA model for annotated images is a standard EM algorithm, which implicitly assumes that the visual and the textual modalities can be treated equivalently. The other two models are based on an asymmetric PLSA learning, allowing to constrain the definition of the latent space on the visual or on the textual modality. We demonstrate that the textual modality is more appropriate to learn a semantically meaningful latent space, which translates into improved annotation performance. A comparison of our learning algorithms with respect to recent methods on a standard dataset is presented, and a detailed evaluation of the performance shows the validity of our framework.

[1] S. Agarwal, A. Awan, and D. Roth, “Learning to Detect Objects in Images via a Sparse, Part-Based Representation,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 26, pp. 1475-1490, 2004.
[2] K. Barnard, P. Duygulu, N. Freitas, D. Forsyth, D. Blei, and M.I. Jordan, “Matching Words and Pictures,” J. Machine Learning Research, vol. 3, pp. 1107-1135, 2003.
[3] D. Blei and M. Jordan, “Modeling Annotated Data,” Proc. Int'l Conf. Research and Development in Information Retrieval, Aug. 2003.
[4] D. Blei, A. Ng, and M. Jordan, “Latent Dirichlet Allocation,” J.Machine Learning Research, vol. 3, pp. 993-1022, 2003.
[5] T. Brants, “Test Data Likelihood for PLSA Models,” Information Retrieval, vol. 8, pp. 181-196, 2005.
[6] W. Buntine, “Variational Extensions to EM and Multinomial PCA,” Proc. European Conf. Machine Learning, Aug. 2002.
[7] C. Carson, S. Belongie, H. Greenspan, and J. Malik, “Blobworld: Image Segmentation Using Expectation-Maximization and Its Application to Image Querying,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, pp. 1026-1038, 2002.
[8] E.Y. Chang, K. Goh, G. Sychay, and G. Wu, “CBSA: Content-Based Soft Annotation for Multimodal Image Retrieval Using Bayes Point Machines,” IEEE Trans. Circuits and Systems for Video Technology, vol. 13, pp. 26-38, 2003.
[9] P. Duygulu, K. Barnard, J.F.G. de Freitas, and D.A. Forsyth, “Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary,” Proc. European Conf. Computer Vision, May 2002.
[10] J. Dy, C. Brodley, A.C. Kak, L. Broderick, and A. Aisen, “Unsupervised Feature Selection Applied to Content-Based Retrieval of Lung Images,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 25, pp. 373-378, 2003.
[11] L. Fei-Fei and P. Perona, “A Bayesian Hierarchical Model for Learning Natural Scene Categories,” Proc. IEEE Int'l Conf. Computer Vision and Pattern Recognition, June 2005.
[12] S.L. Feng, R. Manmatha, and V. Lavrenko, “Multiple Bernoulli Relevance Models for Image and Video Annotation,” Proc. IEEE Int'l Conf. Computer Vision and Pattern Recognition, June 2004.
[13] A. Girgensohn, J. Adcock, and L. Wilcox, “Leveraging Face Recognition Technology to Find and Organize Photos,” Proc. ACM SIGMM Int'l Workshop Multimedia Information Retrieval, 2004.
[14] T. Hofmann, “Unsupervised Learning by Probabilistic Latent Semantic Analysis,” Machine Learning, vol. 42, pp. 177-196, 2001.
[15] J. Jeon, V. Lavrenko, and R. Manmatha, “Automatic Image Annotation and Retrieval Using Cross-Media Relevance Models,” Proc. 26th Int'l Conf. Research and Development in Information Retrieval, Aug. 2003.
[16] J. Jeon and R. Manmatha, “Using Maximum Entropy for Automatic Image Annotation,” Proc. IEEE Int'l Conf. Image and Video Retrieval, July 2004.
[17] V. Lavrenko, R. Manmatha, and J. Jeon, “A Model for Learning the Semantics of Pictures,” Proc. Ann. Conf. Neural Information Processing Systems, Dec. 2003.
[18] J. Li and J.Z. Wang, “Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 25, pp. 1075-1088, 2003.
[19] D.G. Lowe, “Distinctive Image Features from Scale-Invariant Keypoints,” Int'l J. Computer Vision, vol. 60, pp. 91-110, 2004.
[20] F. Monay and D. Gatica-Perez, “On Image Auto-Annotation with Latent Space Models,” Proc. ACM Int'l Conf. Multimedia, Nov. 2003.
[21] F. Monay, P. Quelhas, D. Gatica-Perez, and J.-M. Odobez, “Constructing Vision Models with a Latent Space Approach,” Proc. PASCAL Workshop Subspace, Latent Structure and Feature Selection Techniques: Statistical and Optimisation Perspectives, Feb. 2005.
[22] Y. Mori, H. Takahashi, and R. Oka, “Image-to-Word Transformation Based on Dividing and Vector Quantizing Images with Words,” Proc. Int'l Workshop Multimedia Intelligent Storage and Retrieval Management, Oct. 1999.
[23] H. Mueller, S. Marchand-Maillet, and T. Pun, “The Truth about Corel—Evaluation in Image Retrieval,” Proc. Int'l Conf. Image and Video Retrieval, July 2002.
[24] W. Niblack, R. Barber, W. Equitz, M. Flicker, E. Glasman, D. Petkovic, P. Yanker, and C. Faloutsos, “The QBIC Project: Query Images by Content Using Color, Texture and Shape,” Proc. SPIE Conf. Storage and Retrieval for Image and Video Databases, Feb. 1993.
[25] M. Ortega, Y. Rui, K. Chakrabarti, S. Mehrotra, and T.S. Huang, “Supporting Similarity Queries in MARS,” Proc. ACM Int'l Conf. Multimedia, Nov. 1997.
[26] J.-Y. Pan, H.-J. Yang, P. Duygulu, and C. Faloutsos, “Automatic Image Captioning,” Proc. IEEE Int'l Conf. Multimedia and Expo, June 2004.
[27] P. Quelhas, F. Monay, J.-M. Odobez, D. Gatica-Perez, T. Tuytelaars, and L.V. Gool, “Modeling Scenes with Local Descriptors and Latent Aspects,” Proc. IEEE Int'l Conf. Computer Vision, Oct. 2005.
[28] J. Shi and J. Malik, “Normalized Cuts and Image Segmentation,” Proc. IEEE Int'l Conf. Computer Vision and Pattern Recognition, June 1997.
[29] J. Sivic, B.C. Russell, A.A. Efros, A. Zisserman, and W.T. Freeman, “Discovering Object Categories in Image Collections,” Proc. IEEE Int'l Conf. Computer Vision, Oct. 2005.
[30] A.W.M. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain, “Content-Based Image Retrieval: The End of the Early Years,” IEEE Trans. Pattern Analysis Machine Intelligence, vol. 22, pp. 1349-1380, 2000.
[31] J. Smith, C. Lin, M. Naphade, A. Natsev, and B. Tseng, “Multimedia Semantic Indexing Using Model Vectors,” Proc. IEEE Int'l Conf. Multimedia and Expo, July 2003.
[32] J.R. Smith and S.-F. Chang, “Visualseek: A Fully Automated Content-Based Image Query System,” Proc. ACM Int'l Conf. Multimedia, Nov. 1996.
[33] K. Thieu and P. Viola, “Boosting Image Retrieval,” Int'l J. Computer Vision, vol. 56, pp. 17-36, 2004.
[34] P. Viola and M. Jones, “Robust Real-Time Face Detection,” Int'l J. Computer Vision, vol. 57, pp. 137-154, 2004.
[35] P. Viola and M. Jones, “Rapid Object Detection Using a Boosted Cascade of Simple Features,” Proc. IEEE Int'l Conf. Computer Vision and Pattern Recognition, Dec. 2001.

Index Terms:
Image annotation, textual indexing, image retrieval, quantized local descriptors, latent aspect modeling
Citation:
Florent Monay, Daniel Gatica-Perez, "Modeling Semantic Aspects for Cross-Media Image Indexing," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 29, no. 10, pp. 1802-1817, Oct. 2007, doi:10.1109/TPAMI.2007.1097
Usage of this product signifies your acceptance of the Terms of Use.