This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Unsupervised Learning of Probabilistic Grammar-Markov Models for Object Categories
January 2009 (vol. 31 no. 1)
pp. 114-128
Long Zhu, UCLA, Los Angeles
Yuanhao Chen, USTC, Hefei
Alan Yuille, UCLA, Los Angeles
We introduce a Probabilistic Grammar-Markov Model (PGMM) which couples probabilistic context free grammars and Markov Random Fields. These PGMMs are generative models defined over attributed features and are used to detect and classify objects in natural images. PGMMs are designed so that they can perform rapid inference, parameter learning, and the more difficult task of structure induction. PGMMs can deal with unknown 2D pose (position, orientation, and scale) in both inference and learning, different appearances, or aspects, of the model. The PGMMs can be learnt in an unsupervised manner where the image can contain one of an unknown number of objects of different categories or even be pure background. We first study the weakly supervised case, where each image contains an example of the (single) object of interest, and then generalize to less supervised cases. The goal of this paper is theoretical but, to provide proof of concept, we demonstrate results from this approach on a subset of the Caltech dataset (learning on a training set and evaluating on a testing set). Our results are generally comparable with the current state of the art, and our inference is performed in less than five seconds.

[1] D. McAllester, M. Collins, and F. Pereira, “Case-Factor Diagrams for Structured Probabilistic Modeling,” Proc. 20th Conf. Uncertainty in Artificial Intelligence, pp. 382-391, 2004.
[2] J. Lafferty, A. McCallum, and F. Pereira, “Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data,” Proc. 18th Int'l Conf. Machine Learning, pp. 282-289, 2001.
[3] D. Klein and C. Manning, “Natural Language Grammar Induction Using a Constituent-Context Model,” Advances in Neural Information Processing Systems 14, S.B.T.G. Dietterich and Z. Ghahramani, eds., MIT Press, 2002.
[4] H. Dechter and R. Mateescu, “AND/OR Search Spaces for Graphical Models,” Artificial Intelligence, 2006.
[5] H. Chen, Z.J. Xu, Z.Q. Liu, and S.C. Zhu, “Composite Templates for Cloth Modeling and Sketching,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 943-950, 2006.
[6] L.S. Zettlemoyer and M. Collins, “Learning to Map Sentences to Logical Form: Structured Classification with Probabilistic Categorial Grammars,” Proc. 21st Ann. Conf. Uncertainty in Artificial Intelligence, pp. 658-666, 2005.
[7] B. Ripley, Pattern Recognition and Neural Networks. Cambridge Univ. Press, 1996.
[8] C. Manning and C. Schütze, Foundations of Statistical Natural Language Processing. MIT Press, 1999.
[9] Z. Tu, X. Chen, A. Yuille, and S.-C. Zhu, “Image Parsing: Unifying Segmentation, Detection, and Recognition,” Int'l J. Computer Vision, vol. 63, pp. 113-140, 2005.
[10] H. Barlow, “Unsupervised Learning,” Neural Computation, vol. 1, pp. 295-311, 1989.
[11] R. Fergus, P. Perona, and A. Zisserman, “Object Class Recognition by Unsupervised Scale-Invariant Learning,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2003.
[12] L. Fei-Fei, R. Fergus, and P. Perona, “Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories,” Proc. IEEE Int'l Conf. Computer Vision and Pattern Recognition Workshop Generative-Model Based Vision, 2004.
[13] R. Fergus, P. Perona, and A. Zisserman, “A Sparse Object Category Model for Efficient Learning and Exhaustive Recognition,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, vol. 1, pp. 380-397, 2005.
[14] D.J. Crandall and D.P. Huttenlocher, “Weakly Supervised Learning of Part-Based Spatial Models for Visual Object Recognition,” Proc. European Conf. Computer Vision, vol. 1, pp. 16-29, 2006.
[15] D. Crandall, P. Felzenszwalb, and D. Huttenlocher, “Spatial Priors for Part-Based Recognition Using Statistical Models,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, vol. 1, pp. 10-17, 2005.
[16] G. Csurka, C. Bray, C. Dance, and L. Fan, “Visual Categorization with Bags of Keypoints,” Proc. European Conf. Computer Vision Workshop Statistical Learning in Computer Vision, 2004.
[17] M. Meila and M.I. Jordan, “Learning with Mixtures of Trees,” J.Machine Learning Research, vol. 1, pp. 1-48, 2000.
[18] S.D. Pietra, V.J.D. Pietra, and J.D. Lafferty, “Inducing Features of Random Fields,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 19, no. 4, pp. 380-393, Apr. 1997.
[19] S. Zhu, Y. Wu, and D. Mumford, “Minimax Entropy Principle and Its Application to Texture Modeling,” Neural Computation, vol. 9, no. 8, Nov. 1997.
[20] A. McCallum, “Efficiently Inducing Features of Conditional Random Fields,” Proc. 19th Conf. Uncertainty in Artificial Intelligence, 2003.
[21] N. Friedman, “The Bayesian Structural EM Algorithm,” Proc. 14th Ann. Conf. Uncertainty in Artificial Intelligence, pp. 13-129, 1998.
[22] L. Zhu, Y. Chen, and A. Yuille, “Unsupervised Learning of a Probabilistic Grammar for Object Detection and Parsing,” Advances in Neural Information Processing Systems 19, B. Schölkopf, J.Platt, and T. Hoffman, eds., MIT Press, 2007.
[23] L. Shams and C. von der Malsburg, “Are Object Shape Primitives Learnable?” Neurocomputing, vols. 26-27, pp. 855-863, 1999.
[24] J. Ponce, T.L. Berg, M. Everingham, D.A. Forsyth, M. Hebert, S. Lazebnik, M. Marszalek, C. Schmid, amdA. Torralba, C.K.I. Williams, J. Zhang, and A. Zisserman, Dataset Issues in Object Recognition, Toward Category-Level Object Recognition, J. Ponce, M.Hebert, C. Schmid, and A. Zisserman, eds., 2006.
[25] F.V. Jensen, S.L. Lauritzen, and K.G. Olesen, “Bayesian Updating in Causal Probabilistic Networks by Local Computations,” Computational Statistics Quarterly, vol. 4, pp. 269-282, 1990.
[26] T. Kadir and M. Brady, “Saliency, Scale and Image Description,” Int'l J. Computer Vision, vol. 45, no. 2, pp. 83-105, 2001.
[27] D.G. Lowe, “Distinctive Image Features from Scale-Invariant Keypoints,” Int'l J. Computer Vision, vol. 60, no. 2, pp. 91-110, 2004.
[28] Y. Amit and D. Geman, “A Computational Model for Visual Selection,” Neural Computation, vol. 11, no. 7, pp. 1691-1715, 1999.
[29] S. Lazebnik, C. Schmid, and J. Ponce, “Semi-Local Affine Parts for Object Recognition,” Proc. British Machine Vision Conf., 2004.
[30] Y. Jin and S. Geman, “Context and Hierarchy in a Probabilistic Image Model,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 2145-2152, 2006.
[31] J. Coughlan, D. Snow, C. English, and A. Yuille, “Efficient Deformable Template Detection and Localization without User Initialization,” Computer Vision and Image Understanding, vol. 78, pp. 303-319, 2000.
[32] R. Neal and G.E. Hinton, A View of the EM Algorithm that Justifies Incremental, Sparse, and Other Variants. MIT Press, 1998.

Index Terms:
Structural, Computer vision, Machine learning
Citation:
Long Zhu, Yuanhao Chen, Alan Yuille, "Unsupervised Learning of Probabilistic Grammar-Markov Models for Object Categories," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 1, pp. 114-128, Jan. 2009, doi:10.1109/TPAMI.2008.67
Usage of this product signifies your acceptance of the Terms of Use.