The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.07 - July (2010 vol.32)
pp: 1271-1283
Jan C. van Gemert , Ecole Normale Supérieure, Paris
Cor J. Veenman , University of Amsterdam, Amsterdam
Arnold W.M. Smeulders , University of Amsterdam, Amsterdam
Jan-Mark Geusebroek , University of Amsterdam, Amsterdam
ABSTRACT
This paper studies automatic image classification by modeling soft assignment in the popular codebook model. The codebook model describes an image as a bag of discrete visual words selected from a vocabulary, where the frequency distributions of visual words in an image allow classification. One inherent component of the codebook model is the assignment of discrete visual words to continuous image features. Despite the clear mismatch of this hard assignment with the nature of continuous features, the approach has been successfully applied for some years. In this paper, we investigate four types of soft assignment of visual words to image features. We demonstrate that explicitly modeling visual word assignment ambiguity improves classification performance compared to the hard assignment of the traditional codebook model. The traditional codebook model is compared against our method for five well-known data sets: 15 natural scenes, Caltech-101, Caltech-256, and Pascal VOC 2007/2008. We demonstrate that large codebook vocabulary sizes completely deteriorate the performance of the traditional model, whereas the proposed model performs consistently. Moreover, we show that our method profits in high-dimensional feature spaces and reaps higher benefits when increasing the number of image categories.
INDEX TERMS
Computer vision, object recognition, image/video retrieval.
CITATION
Jan C. van Gemert, Cor J. Veenman, Arnold W.M. Smeulders, Jan-Mark Geusebroek, "Visual Word Ambiguity", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.32, no. 7, pp. 1271-1283, July 2010, doi:10.1109/TPAMI.2009.132
REFERENCES
[1] A. Agarwal and B. Triggs, "Multilevel Image Coding with Hyperfeatures," Int'l J. Computer Vision, pp. 15-27, 2008.
[2] D. Batra, R. Sukthankar, and T. Chen, "Learning Class-Specific Affinities for Image Labelling," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008.
[3] O. Boiman, E. Shechtman, and M. Irani, "In Defense of Nearest-Neighbor Based Image Classification," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008.
[4] A. Bosch, A. Zisserman, and X. Munoz, "Scene Classification Using a Hybrid Generative/Discriminative Approach," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 30, no. 4, pp. 712-727, Apr. 2008.
[5] A. Bosch, A. Zisserman, and X. Munoz, "Image Classification Using Random Forests and Ferns," Proc. Int'l Conf. Computer Vision, 2007.
[6] M. Boutell, J. Luo, and C. Brown, "Factor-Graphs for Region-Based Whole-Scene Classification," Proc. IEEE Conf. Computer Vision and Pattern Recognition Semantic Learning Applications in Multimedia Workshop, 2006.
[7] O. Chum and A. Zisserman, "An Exemplar Model for Learning Object Classes," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2007.
[8] L. Fei-Fei and P. Perona, "A Bayesian Hierarchical Model for Learning Natural Scene Categories," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2005.
[9] J. van Gemert, J. Geusebroek, C. Veenman, and A. Smeulders, "Kernel Codebooks for Scene Categorization," Proc. European Conf. Computer Vision, 2008.
[10] J. van Gemert, J. Geusebroek, C. Veenman, C. Snoek, and A. Smeulders, "Robust Scene Categorization by Learning Image Statistics in Context," Proc. IEEE Conf. Computer Vision and Pattern Recognition Semantic Learning Applications in Multimedia Workshop, 2006.
[11] Y.-G. Jiang, C.-W. Ngo, and J. Yang, "Towards Optimal Bag-of-Features for Object Categorization and Semantic Video Retrieval," Proc. Int'l Conf. Image and Video Retrieval, 2007.
[12] F. Jurie and B. Triggs, "Creating Efficient Codebooks for Visual Recognition," Proc. Int'l Conf. Computer Vision, pp. 604-610, 2005.
[13] D. Larlus and F. Jurie, "Latent Mixture Vocabularies for Object Categorization," Proc. British Machine Vision Conf., 2006.
[14] S. Lazebnik, C. Schmid, and J. Ponce, "Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories," Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 2169-2178, 2006.
[15] J. Liu and M. Shah, "Scene Modeling Using Co-Clustering," Proc. Int'l Conf. Computer Vision, 2007.
[16] M. Marszałek, C. Schmid, H. Harzallah, and J. van de Weijer, "Learning Object Representations for Visual Object Class Recognition, Pascal Voc," 2007.
[17] K. Mikolajczyk, B. Leibe, and B. Schiele, "Multiple Object Class Detection with a Generative Model," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2006.
[18] A. Mojsilović, J. Gomes, and B. Rogowitz, "Semantic-Friendly Indexing and Quering of Images Based on the Extraction of the Objective Semantic Cues," Int'l J. Computer Vision, vol. 56, nos. 1/2, pp. 79-107, 2004.
[19] F. Moosmann, E. Nowak, and F. Jurie, "Randomized Clustering Forests for Image Classification," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 30, no. 9, pp. 1632-1646, Sept. 2008.
[20] E. Nowak, F. Jurie, and B. Triggs, "Sampling Strategies for Bag-of-Features Image Classification," Proc. European Conf. Computer Vision, 2006.
[21] F. Perronnin, "Universal and Adapted Vocabularies for Generic Visual Categorization," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 30, no. 7, pp. 1243-1256, July 2008.
[22] J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman, "Lost in Quantization: Improving Particular Object Retrieval in Large Scale Image Databases," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008.
[23] P. Quelhas, F. Monay, J. Odobez, D. Gatica-Perez, and T. Tuytelaars, "A Thousand Words in a Scene," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 29, no. 9, pp. 1575-1589, Sept. 2007.
[24] K. van de Sande, T. Gevers, and C. Snoek, "Evaluation of Color Descriptors for Object and Scene Recognition," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008.
[25] J. Sivic and A. Zisserman, "Video Google: A Text Retrieval Approach to Object Matching in Videos," Proc. Int'l Conf. Computer Vision, 2003.
[26] E. Sudderth, A. Torralba, W. Freeman, and A. Willsky, "Describing Visual Scenes Using Transformed Objects and Parts," Int'l J. Computer Vision, vol. 77, nos. 1-3, pp. 291-330, 2008.
[27] M. Tahir, K. van de Sande, J. Uijlings, F. Yan, X. Li, K. Mikolajczyk, J. Kittler, T. Gevers, and A. Smeulders, "Surreyuva_srkda Method, Pascal Voc 2008," http://pascallin.ecs.soton.ac.uk/challenges/ VOC/voc2008/workshoptahir.pdf, 2009.
[28] T. Tuytelaars and C. Schmid, "Vector Quantizing Feature Space with a Regular Lattice," Proc. Int'l Conf. Computer Vision, 2007.
[29] J. Vogel and B. Schiele, "Semantic Modeling of Natural Scenes for Content-Based Image Retrieval," Int'l J. Computer Vision, vol. 72, pp. 133-157, 2007.
[30] J. Winn, A. Criminisi, and T. Minka, "Object Categorization by Learned Universal Visual Dictionary," Proc. Int'l Conf. Computer Vision, pp. 1800-1807, 2005.
[31] L. Yang, R. Jin, C. Pantofaru, and R. Sukthankar, "Discriminative Cluster Refinement: Improving Object Category Recognition Given Limited Training Data," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2007.
[32] H. Jégou, M. Douze, and C. Schmid, "On the Burstiness of Visual Elements," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2009.
[33] C. Bishop, Pattern Recognition and Machine Learning. Springer, Aug. 2006.
[34] D. Blei, A. Ng, and M. Jordan, "Latent Dirichlet Allocation," J. Machine Learning Research, vol. 3, pp. 993-1022, 2003.
[35] T. Hofmann, "Unsupervised Learning by Probabilistic Latent Semantic Analysis," Machine Learning, vol. 42, nos. 1/2, pp. 177-196, 2001.
[36] B. Silverman and P. Green, Density Estimation for Statistics and Data Analysis. Chapman and Hall, 1986.
[37] N. Vasconcelos and A. Lippman, "A Unifying View of Image Similarity," Proc. Int'l Conf. Pattern Recognition, pp. 1038-1041, 2000.
[38] D. Lowe, "Distinctive Image Features from Scale Invariant Keypoints," Int'l J. Computer Vision, vol. 60, pp. 91-110, 2004.
[39] L. Fei-Fei, R. Fergus, and P. Perona, "Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories," Proc. Workshop Generative-Model Based Vision, 2004.
[40] G. Griffin, A. Holub, and P. Perona, "Caltech-256 Object Category Data Set," Technical Report UCB/CSD-04-1366, California Inst. of Tech nology, 2007.
[41] M. Everingham, L. Van Gool, C.K.I. Williams, J. Winn, and A. Zisserman, "The PASCAL Visual Object Classes Challenge 2007 (VOC '07) Results," http://www.pascal-network.org/challenges/ VOC/voc2007/workshopindex.html, 2009.
[42] M. Everingham, L. Van Gool, C.K.I. Williams, J. Winn, and A. Zisserman, "The PASCAL Visual Object Classes Challenge 2008 (VOC '08) Results," http://www.pascal-network.org/ challenges/ VOC/voc2008/workshopindex.html, 2009.
[43] A. Oliva and A. Torralba, "Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope," Int'l J.Computer Vision, vol. 42, no. 3, pp. 145-175, 2001.
[44] G. Burghouts, A. Smeulders, and J. Geusebroek, "The Distribution Family of Similarity Distances," Proc. Conf. Neural Information Processing Systems, 2007.
21 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool