The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.03 - March (2010 vol.32)
pp: 501-516
Björn Ommer , University of California at Berkeley, Berkeley
ABSTRACT
Real-world scene understanding requires recognizing object categories in novel visual scenes. This paper describes a composition system that automatically learns structured, hierarchical object representations in an unsupervised manner without requiring manual segmentation or manual object localization. A central concept for learning object models in the challenging, general case of unconstrained scenes, large intraclass variations, large numbers of categories, and lacking supervision information is to exploit the compositional nature of our (visual) world. The compositional nature of visual objects significantly limits their representation complexity and renders learning of structured object models statistically and computationally tractable. We propose a robust descriptor for local image parts and show how characteristic compositions of parts can be learned that are based on an unspecific part vocabulary shared between all categories. Moreover, a Bayesian network is presented that comprises all the compositional constituents together with scene context and object shape. Object recognition is then formulated as a statistical inference problem in this probabilistic model.
INDEX TERMS
Image categorization, object recognition, compositionality, graphical models, visual learning.
CITATION
Björn Ommer, "Learning the Compositional Nature of Visual Object Categories for Recognition", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.32, no. 3, pp. 501-516, March 2010, doi:10.1109/TPAMI.2009.22
REFERENCES
[1] F. Attneave, “Some Informational Aspects of Visual Perception,” Psychological Rev., vol. 61, no. 3, pp. 183-193, 1954.
[2] S. Geman, D.F. Potter, and Z. Chi, “Composition Systems,” Quarterly of Applied Math., vol. 60, pp. 707-736, 2002.
[3] I. Biederman, “Recognition-by-Components: A Theory of Human Image Understanding,” Psychological Rev., vol. 94, no. 2, pp. 115-147, 1987.
[4] D.G. Lowe, Perceptual Organization and Visual Recognition. Kluwer Academic Publishers, 1985.
[5] R. Fergus, P. Perona, and A. Zisserman, “Object Class Recognition by Unsupervised Scale Invariant Learning,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 264-271, 2003.
[6] B. Ommer and J.M. Buhmann, “Learning Compositional Categorization Models,” Proc. European Conf. Computer Vision, pp. 316-329, 2006.
[7] D.G. Lowe, “Distinctive Image Features from Scale Invariant Keypoints,” Int'l J. Computer Vision, vol. 60, no. 2, pp. 91-110, 2004.
[8] Y. Amit and D. Geman, “A Computational Model for Visual Selection,” Neural Computation, vol. 11, no. 7, pp. 1691-1715, 1998.
[9] M.C. Burl, M. Weber, and P. Perona, “A Probabilistic Approach to Object Recognition Using Local Photometry and Global Geometry,” Proc. European Conf. Computer Vision, pp. 628-641, 1998.
[10] S. Agarwal, A. Awan, and D. Roth, “Learning to Detect Objects in Images via a Sparse, Part-Based Representation,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 26, no. 11, pp. 1475-1490, Nov. 2004.
[11] L. Fei-Fei, R. Fergus, and P. Perona, “Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories,” Proc. IEEE Conf. Computer Vision and Pattern Recognition Workshop Generative Model Based Vision, 2004.
[12] B. Leibe and B. Schiele, “Scale Invariant Object Categorization Using a Scale-Adaptive Mean-Shift Search,” Proc. Pattern Recognition Symp., pp. 145-153, 2004.
[13] B. Ommer and J.M. Buhmann, “Object Categorization by Compositional Graphical Models,” Proc. Int'l Workshop Energy Minimization Methods in Computer Vision and Pattern Recognition, pp. 235-250, 2005.
[14] T. Serre, L. Wolf, S. Bileschi, M. Riesenhuber, and T. Poggio, “Robust Object Recognition with Cortex-Like Mechanisms,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 29, no. 3, pp. 411-426, Mar. 2007.
[15] A.C. Berg, T.L. Berg, and J. Malik, “Shape Matching and Object Recognition Using Low Distortion Correspondence,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 26-33, 2005.
[16] A. Opelt, A. Pinz, and A. Zisserman, “Incremental Learning of Object Detectors Using a Visual Shape Alphabet,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 3-10, 2006.
[17] V. Ferrari, T. Tuytelaars, and L.J.V. Gool, “Object Detection by Contour Segment Networks,” Proc. European Conf. Computer Vision, pp. 14-28, 2006.
[18] B. Leibe, A. Leonardis, and B. Schiele, “Combined Object Categorization and Segmentation with an Implicit Shape Model,” Proc. European Conf. Computer Vision Workshop Statistical Learning in Computer Vision, 2004.
[19] G. Csurka, C.R. Dance, L. Fan, J. Willamowski, and C. Bray, “Visual Categorization with Bags of Keypoints,” Proc. European Conf. Computer Vision Workshop Statistical Learning in Computer Vision, 2004.
[20] S. Lazebnik, C. Schmid, and J. Ponce, “Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 2169-2178, 2006.
[21] M.A. Fischler and R.A. Elschlager, “The Representation and Matching of Pictorial Structures,” IEEE Trans. Computers, vol. 22, no. 1, pp. 67-92, Jan. 1973.
[22] M. Lades, J.C. Vorbrüggen, J.M. Buhmann, J. Lange, C. von der Malsburg, R.P. Würtz, and W. Konen, “Distortion Invariant Object Recognition in the Dynamic Link Architecture,” IEEE Trans. Computers, vol. 42, no. 3, pp. 300-311, Mar. 1993.
[23] M. Weber, M. Welling, and P. Perona, “Unsupervised Learning of Models for Recognition,” Proc. European Conf. Computer Vision, pp.18-32, 2000.
[24] A. Holub, M. Welling, and P. Perona, “Combining Generative Models and Fisher Kernels for Object Recognition,” Proc. IEEE Int'l Conf. Computer Vision, pp. 136-143, 2005.
[25] K. Fukushima, “Neocognitron: A Self-Organizing Neural Network Model for a Mechanism of Pattern Recognition Unaffected by Shift in Position,” Biological Cybernetics, vol. 36, no. 4, pp. 193-202, 1980.
[26] T. Hofmann, “Unsupervised Learning by Probabilistic Latent Semantic Analysis,” Machine Learning, vol. 42, no. 1, pp. 177-196, 2001.
[27] D.M. Blei, A.Y. Ng, and M.I. Jordan, “Latent Dirichlet Allocation,” J. Machine Learning Research, vol. 3, pp. 993-1022, 2003.
[28] J. Sivic, B.C. Russell, A.A. Efros, A. Zisserman, and W.T. Freeman, “Discovering Objects and Their Localization in Images,” Proc. IEEE Int'l Conf. Computer Vision, pp. 370-377, 2005.
[29] R. Fergus, L. Fei-Fei, P. Perona, and A. Zisserman, “Learning Object Categories from Google's Image Search,” Proc. IEEE Int'l Conf. Computer Vision, pp. 1816-1823, 2005.
[30] B. Epshtein and S. Ullman, “Feature Hierarchies for Object Classification,” Proc. IEEE Int'l Conf. Computer Vision, pp. 220-227, 2005.
[31] G. Bouchard and B. Triggs, “Hierarchical Part-Based Visual Object Categorization,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 710-715, 2005.
[32] B. Ommer, M. Sauter, and J.M. Buhmann, “Learning Top-Down Grouping of Compositional Hierarchies for Recognition,” Proc. IEEE Conf. Computer Vision and Pattern Recognition Workshop Percept Organization in Computer Vision, 2006.
[33] A.Y. Ng and M.I. Jordan, “On Discriminative vs. Generative Classifiers: A Comparison of Logistic Regression and Naive Bayes,” Proc. Advances in Neural Information Processing Systems, pp. 841-848, 2002.
[34] E.B. Sudderth, A.B. Torralba, W.T. Freeman, and A.S. Willsky, “Learning Hierarchical Models of Scenes, Objects, and Parts,” Proc. IEEE Int'l Conf. Computer Vision, pp. 1331-1338, 2005.
[35] Z. Tu, X. Chen, A. Yuille, and S. Zhu, “Image Parsing: Unifying Segmentation, Detection and Recognition,” Int'l J. Computer Vision, vol. 63, no. 2, pp. 113-140, 2005.
[36] R. Fergus, P. Perona, and A. Zisserman, “A Visual Category Filter for Google Images,” Proc. European Conf. Computer Vision, pp. 242-256, 2004.
[37] E. Borenstein, E. Sharon, and S. Ullman, “Combining Top-Down and Bottom-Up Segmentation,” Proc. IEEE Conf. Computer Vision and Pattern Recognition Workshop Percept Organization in Computer Vision, 2004.
[38] P.A. Viola and M.J. Jones, “Rapid Object Detection Using a Boosted Cascade of Simple Features,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 511-518, 2001.
[39] K. Grauman and T. Darrell, “Pyramid Match Kernels: Discriminative Classification with Sets of Image Features,” Technical Report MIT-CSAIL-TR-2006-020, 2006.
[40] P.F. Felzenszwalb and D.P. Huttenlocher, “Pictorial Structures for Object Recognition,” Int'l J. Computer Vision, vol. 61, no. 1, pp. 55-79, 2005.
[41] Y. Jin and S. Geman, “Context and Hierarchy in a Probabilistic Image Model,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 2145-2152, 2006.
[42] R. Veltkamp and M. Tanase, “Content-Based Image Retrieval Systems: A Survey,” Technical Report UU-CS-2000-34, Information and Computing Sciences, Utrecht Univ., 2000.
[43] K. Mikolajczyk and C. Schmid, “Scale & Affine Invariant Interest Point Detectors,” Int'l J. Computer Vision, vol. 60, no. 1, pp. 63-86, 2004.
[44] G. Winkler, Image Analysis, Random Fields and Markov Chain Monte Carlo Methods—A Mathematical Introduction, second ed. Springer, 2003.
[45] V. Roth and K. Tsuda, “Pairwise Coupling for Machine Recognition of Hand-Printed Japanese Characters,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 1120-1125, 2001.
[46] J. Puzicha, T. Hofmann, and J.M. Buhmann, “Histogram Clustering for Unsupervised Segmentation and Image Retrieval,” Pattern Recognition Letters, vol. 20, pp. 899-909, 1999.
[47] M. Everingham, A. Zisserman, C.K.I. Williams, and L. VanGool, “The PASCAL Visual Object Classes Challenge 2006 (VOC '06),” http://www.pascal-network.org/challenges/ VOCvoc2006, 2006.
[48] A. Bosch, A. Zisserman, and X. Munoz, “Image Classification Using Random Forests and Ferns,” Proc. IEEE Int'l Conf. Computer Vision, 2007.
[49] H. Zhang, A.C. Berg, M. Maire, and J. Malik, “SVM-KNN: Discriminative Nearest Neighbor Classification for Visual Category Recognition,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 2126-2133, 2006.
[50] J. Mutch and D.G. Lowe, “Multiclass Object Recognition with Sparse, Localized Features,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 11-18, 2006.
36 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool