The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.05 - May (2012 vol.34)
pp: 902-917
Nikhil Rasiwasia , University of California at San Diego, La Jolla
Nuno Vasconcelos , University of California at San Diego, La Jolla
ABSTRACT
A novel framework to context modeling based on the probability of co-occurrence of objects and scenes is proposed. The modeling is quite simple, and builds upon the availability of robust appearance classifiers. Images are represented by their posterior probabilities with respect to a set of contextual models, built upon the bag-of-features image representation, through two layers of probabilistic modeling. The first layer represents the image in a semantic space, where each dimension encodes an appearance-based posterior probability with respect to a concept. Due to the inherent ambiguity of classifying image patches, this representation suffers from a certain amount of contextual noise. The second layer enables robust inference in the presence of this noise by modeling the distribution of each concept in the semantic space. A thorough and systematic experimental evaluation of the proposed context modeling is presented. It is shown that it captures the contextual “gist” of natural images. Scene classification experiments show that contextual classifiers outperform their appearance-based counterparts, irrespective of the precise choice and accuracy of the latter. The effectiveness of the proposed approach to context modeling is further demonstrated through a comparison to existing approaches on scene classification and image retrieval, on benchmark data sets. In all cases, the proposed approach achieves superior results.
INDEX TERMS
Computer vision, scene classification, context, image retrieval, topic models.
CITATION
Nikhil Rasiwasia, Nuno Vasconcelos, "Holistic Context Models for Visual Recognition", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.34, no. 5, pp. 902-917, May 2012, doi:10.1109/TPAMI.2011.175
REFERENCES
[1] M. Bar, "Visual Objects in Context," Nature Rev. Neuroscience, vol. 5, no. 8, pp. 617-629, 2004.
[2] K. Barnard, P. Duygulu, N. de Freitas, D. Forsyth, D. Blei, and M.I. Jordan, "Matching Words and Pictures," J. Machine Learning, vol. 3, pp. 1107-1135, 2003.
[3] E. Bart, I. Porteous, P. Perona, and M. Welling, "Unsupervised Learning of Visual Taxonomies," Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 1-8, 2008.
[4] I. Biederman, R. Mezzanotte, and J. Rabinowitz, "Scene Perception: Detecting and Judging Objects Undergoing Relational Violations," Cognitive Psychology, vol. 14, pp. 143-77, 1982.
[5] D. Blei and M. Jordan, "Modeling Annotated Data," Proc. ACM SIGIR Conf. Research and Development in Information Retrieval, 2003.
[6] D. Blei, A. Ng, and M. Jordan, "Latent Dirichlet Allocation," The J. Machine Learning Research, vol. 3, pp. 993-1022, 2003.
[7] A. Bosch, A. Zisserman, and X. Munoz, "Scene Classification Using a Hybrid Generative/Discriminative Approach," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 30, no. 4, pp. 712-727, Apr. 2008.
[8] D. Cai, X. He, and J. Han, "Efficient Kernel Discriminant Analysis via Spectral Regression," Proc. Seventh IEEE Int'l Conf. Data Mining, 2007.
[9] G. Carneiro, A. Chan, P. Moreno, and N. Vasconcelos, "Supervised Learning of Semantic Classes for Image Annotation and Retrieval," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 29, no. 3, pp. 394-410, Mar. 2007.
[10] A. Chan and N. Vasconcelos, "Probabilistic Kernels for the Classification of Auto-Regressive Visual Processes," Proc. IEEE Conf. Computer Vision and Pattern Recognition, vol. 1, 2005.
[11] H. Cheng, Z. Liu, and J. Yang, "Sparsity Induced Similarity Measure for Label Propagation," Proc. IEEE Int'l Conf. Computer Vision, 2009.
[12] R. Datta, D. Joshi, J. Li, and J. Wang, "Image Retrieval: Ideas, Influences, and Trends of the New Age," ACM Computing Surveys, vol. 39, p. 65, 2007.
[13] P. Duygulu, K. Barnard, N. Freitas, and D. Forsyth, "Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary," Proc. European Conf. Computer Vision, 2002.
[14] P. Felzenszwalb, R. Girshick, D. McAllester, and D. Ramanan, "Object Detection with Discriminatively Trained Part Based Models," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 32, no. 9, pp. 1627-1645, Sept. 2010.
[15] S. Feng, R. Manmatha, and V. Lavrenko, "Multiple Bernoulli Relevance Models for Image and Video Annotation," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2004.
[16] R. Fergus, P. Perona, and A. Zisserman, "Object Class Recognition by Unsupervised Scale-Invariant Learning," Proc. IEEE Conf. Computer Vision and Pattern Recognition, vol. 2, pp. 264-271, June 2003.
[17] M. Fink and P. Perona, "Mutual Boosting for Contextual Inference," Proc. Neural Information Processing Systems, 2004.
[18] C. Galleguillos, A. Rabinovich, and S. Belongie, "Object Categorization Using Co-Ocurrence Location and Appearance," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008.
[19] K. Grauman and T. Darrell, "The Pyramid Match Kernel: Efficient Learning with Sets of Features," J. Machine Learning Research, vol. 8, pp. 725-760, 2007.
[20] G. Heitz and D. Koller, "Learning Spatial Context: Using Stuff to Find Things," Proc. 10th European Conf. Computer Vision, pp. 30-43, 2008.
[21] T. Hofmann, "Probabilistic Latent Semantic Indexing," Proc. ACM SIGIR Conf. Research and Development in Information Retrieval, 1999.
[22] A. Joshi, F. Porikli, and N. Papanikolopoulos, "Multi-Class Active Learning for Image Classification," Proc. IEEE Conf. Computer Vision and Pattern Recognition. 2009.
[23] J. Kivinen, E. Sudderth, and M. Jordan, "Learning Multiscale Representations of Natural Scenes Using Dirichlet Processes," Proc. IEEE Int'l Conf. Computer Vision. 2007.
[24] V. Lavrenko, R. Manmatha, and J. Jeon, "A Model for Learning the Semantics of Pictures," Proc. Advances in Neural Information Processing Systems, 2003.
[25] S. Lazebnik, C. Schmid, and J. Ponce, "Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2005.
[26] F.-F. Li and P. Perona, "A Bayesian Hierarchical Model for Learning Natural Scene Categories," Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 524-531, 2005.
[27] J. Lim, P. Arbeláez, C. Gu, and J. Malik, "Context by Region Ancestry," Proc. IEEE Int'l Conf. Computer Vision, 2010.
[28] J. Liu and M. Shah, "Scene Modeling Using Co-clustering," Proc. IEEE Int'l Conf. Computer Vision, 2007.
[29] J. Liu, Y. Yang, and M. Shah, "Learning Semantic Visual Vocabularies Using Diffusion Distance," Proc. IEEE Conf. Computer Vision and Pattern Recognition. 2009.
[30] D. Lowe, "Distinctive Image Features from Scale-Invariant Keypoints," Int'l J. Computer Vision, vol. 60, no. 2, pp. 91-110, 2004.
[31] D. MacKay, Information Theory, Inference, and Learning Algorithms. Cambridge Univ. Press, 2003.
[32] J. Magalhães, S. Overell, and S. Rüger, "A Semantic Vector Space for Query by Image Example," Proc. ACM SIGIR Conf. Research and Development in Information Retrieval, 2007.
[33] A. Oliva and P. Schyns, "Diagnostic Colors Mediate Scene Recognition," Cognitive Psychology, vol. 41, no. 2, pp. 176-210, 2000.
[34] A. Oliva and A. Torralba, "Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope," Int'l J. Computer Vision, vol. 42, no. 3, pp. 145-175, 2001.
[35] A. Oliva and A. Torralba, "Building the Gist of A Scene: The Role of Global Image Features in Recognition," Progress in Brain Research: Visual Perception, vol. 155, pp. 23-36, 2006.
[36] J. Platt, "Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods," Advances in Large Margin Classifiers, pp. 61-74, MIT Press, 1999.
[37] P. Quelhas, F. Monay, J. Odobez, D. Gatica-Perez, and T. Tuytelaars, "A Thousand Words in a Scene," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 29, no. 9, pp. 1575-1589, Sept. 2007.
[38] A. Rabinovich, A. Vedaldi, C. Galleguillos, E. Wiewiora, and S. Belongie, "Objects in Context," Proc. IEEE Int'l Conf. Computer Vision, pp. 1-8, 2007.
[39] N. Rasiwasia, P. Moreno, and N. Vasconcelos, "Bridging the Gap: Query by Semantic Example," IEEE Trans. Multimedia, vol. 9, no. 5, pp. 923-938, Aug. 2007.
[40] N. Rasiwasia and N. Vasconcelos, "Scene Classification with Low-Dimensional Semantic Spaces and Weak Supervision," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008.
[41] N. Rasiwasia and N. Vasconcelos, "Holistic Context Modeling Using Semantic Co-Occurrences," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2009.
[42] L. Renninger and J. Malik, "When Is Scene Identification Just Texture Recognition?" Vision Research, vol. 44, no. 19, pp. 2301-2311, 2004.
[43] H. Rowley, S. Baluja, and T. Kanade, "Neural Network-Based Face Detection," Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 203-208, 1996.
[44] J. Shotton, J. Winn, C. Rother, and A. Criminisi, "Textonboost for Image Understanding: Multi-Class Object Recognition and Segmentation by Jointly Modeling Appearance, Shape and Context," Int'l J. Computer Vision, vol. 81, pp. 1-22, 2007.
[45] J. Sivic, B. Russell, A. Efros, A. Zisserman, and W. Freeman, "Discovering Object Categories in Image Collections," Proc. IEEE Int'l Conf. Computer Vision, vol. 1, p. 65, 2005.
[46] A. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain, "Content-Based Image Retrieval: The End of the Early Years," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 12, pp. 1349-1380, Dec. 2000.
[47] J. Smith, M. Naphade, and A. Natsev, "Multimedia Semantic Indexing Using Model Vectors," Proc. IEEE Int'l Conf. Multimedia and Expo, pp. 445-448, 2003.
[48] E. Sudderth, A. Torralba, W. Freeman, and A. Willsky, "Learning Hierarchical Models of Scenes, Objects, and Parts," Proc. IEEE Int'l Conf. Computer Vision, vol. 2, 2005.
[49] A. Torralba, "Contextual Priming for Object Detection," Int'l J. Computer Vision, vol. 53, pp. 169-191, 2003.
[50] A. Torralba, K. Murphy, and W. Freeman, "Contextual Models for Object Detection Using Boosted Random Fields," Proc. Advances in Neural Information Processing Systems, 2004.
[51] L. Torresani, M. Szummer, and A. Fitzgibbon, "Efficient Object Category Recognition Using Classemes," Proc. 11th European Conf. Computer Vision, pp. 776-789, 2010.
[52] J. van Gemert, J. Geusebroek, C. Veenman, and A. Smeulders, "Kernel Codebooks for Scene Categorization," Proc. European Conf. Computer Vision, pp. 696-709, 2008.
[53] M. Vasconcelos, N. Vasconcelos, and G. Carneiro, "Weakly Supervised Top-Down Image Segmentation," Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 1001-1006, 2006.
[54] N. Vasconcelos, "Minimum Probability of Error Image Retrieval," IEEE Trans. Signal Processing, vol. 52, no. 8, pp. 2322-2336, Aug. 2004.
[55] N. Vasconcelos, "Image Indexing with Mixture Hierarchies," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2001.
[56] P. Viola and M. Jones, "Robust Real-Time Object Detection," Int'l J. Computer Vision, vol. 1, no. 2, 2002.
[57] J. Vogel and B. Schiele, "A Semantic Typicality Measure for Natural Scene Categorization," Proc. DAGM04 Ann. Pattern Recognition Symp., 2004.
[58] G. Wang, D. Hoiem, and D. Forsyth, "Learning Image Similarity from Flickr Groups Using Stochastic Intersection Kernel Machines," Proc. IEEE Int'l Conf. Computer Vision, pp. 428-435, 2009.
[59] T. Westerveld and A. de Vries, "Experimental Evaluation of a Generative Probabilistic Image Retrieval Model on Easy Data," Proc. ACM SIGIR Conf. Research and Development in Information Retrieval, 2003.
[60] L. Wolf and S. Bileschi, "A Critical View of Context," Int'l J. Computer Vision, vol. 69, no. 2, pp. 251-261, 2006.
[61] H. Zhang, A. Berg, M. Maire, and J. Malik, "SVM-KNN: Discriminative Nearest Neighbor Classification for Visual Category Recognition," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2006.
[62] X. Zhou, N. Cui, Z. Li, F. Liang, and T. Huang, "Hierarchical Gaussianization for Image Classification," Proc. 12th IEEE Int'l Conf. Computer Vision, pp. 1971-1977, 2009.
254 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool