CSDL Home IEEE Transactions on Pattern Analysis & Machine Intelligence 2012 vol.34 Issue No.02 - February

Subscribe

Issue No.02 - February (2012 vol.34)

pp: 359-371

Long Zhu , Univ. of California, Los Angeles, Los Angeles, CA, USA

Yuanhao Chen , Univ. of California, Los Angeles, Los Angeles, CA, USA

Yuan Lin , Shanghai Jiaotong Univ., Shanghai, China

Chenxi Lin , Alibaba Group R&D, Beijing, China

A. Yuille , Univ. of California, Los Angeles, Los Angeles, CA, USA

ABSTRACT

In this paper, we propose a Hierarchical Image Model (HIM) which parses images to perform segmentation and object recognition. The HIM represents the image recursively by segmentation and recognition templates at multiple levels of the hierarchy. This has advantages for representation, inference, and learning. First, the HIM has a coarse-to-fine representation which is capable of capturing long-range dependency and exploiting different levels of contextual information (similar to how natural language models represent sentence structure in terms of hierarchical representations such as verb and noun phrases). Second, the structure of the HIM allows us to design a rapid inference algorithm, based on dynamic programming, which yields the first polynomial time algorithm for image labeling. Third, we learn the HIM efficiently using machine learning methods from a labeled data set. We demonstrate that the HIM is comparable with the state-of-the-art methods by evaluation on the challenging public MSRC and PASCAL VOC 2007 image data sets.

INDEX TERMS

polynomials, context-free grammars, dynamic programming, image segmentation, inference mechanisms, learning (artificial intelligence), object recognition, coarse-to-fine representation, HIM, image recursive segmentation, object recognition templates, hierarchical image model, image parsing, contextual information, natural language models, sentence structure, hierarchical representation, rapid inference algorithm, dynamic programming, polynomial time algorithm, image labeling, machine learning methods, labeled data set, public MSRC image data sets, PASCAL VOC 2007 image data sets, Hierarchical systems, Image segmentation, Scene analysis, scene labeling., Hierarchy, parsing, segmentation

CITATION

Long Zhu, Yuanhao Chen, Yuan Lin, Chenxi Lin, A. Yuille, "Recursive segmentation and recognition templates for image parsing",

*IEEE Transactions on Pattern Analysis & Machine Intelligence*, vol.34, no. 2, pp. 359-371, February 2012, doi:10.1109/TPAMI.2011.160REFERENCES

- [1] F. Jelinek and J.D. Lafferty, "Computation of the Probability of Initial Substring Generation by Stochastic Context-Free Grammars,"
Computational Linguistics, vol. 17, no. 3, pp. 315-323, 1991.- [2] M. Collins, "Head-Driven Statistical Models for Natural Language Parsing," PhD thesis, Univ. of Pennsylvania, 1999.
- [3] K. Lari and S.J. Young, "The Estimation of Stochastic Context-Free Grammars Using the Inside-Outside Algorithm,"
Computer Speech and Language, vol. 4, pp. 35-56, 1990.- [4] M. Shilman, P. Liang, and P.A. Viola, "Learning Non-Generative Grammatical Models for Document Analysis,"
Proc. IEEE Int'l Conf. Computer Vision, pp. 962-969, 2005.- [5] Z. Tu and S.C. Zhu, "Image Segmentation by Data-Driven Markov Chain Monte Carlo,"
IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 5, pp. 657-673, May 2002.- [6] Z. Tu, X. Chen, A.L. Yuille, and S.C. Zhu, "Image Parsing: Unifying Segmentation, Detection, and Recognition,"
Proc. Ninth IEEE Int'l Conf. Computer Vision, pp. 18-25, 2003.- [7] Y. Jin and S. Geman, "Context and Hierarchy in a Probabilistic Image Model,"
Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, pp. 2145-2152, 2006.- [8] S. Zhu and D. Mumford, "A Stochastic Grammar of Images,"
Foundations and Trends in Computer Graphics and Vision, vol. 2, no. 4, pp. 259-362, 2006.- [9] C. Bouman and M. Shapiro, "A Multiscale Random Field Model for Bayesian Image Segmentation,"
IEEE Trans. Image Processing, vol. 3, no. 2, pp. 162-177, Mar. 1994.- [10] M.S. Crouse, R.D. Nowak, and R.G. Baraniuk, "Wavelet-Based Statistical Signal Processing Using Hidden Markov Models,"
IEEE Trans. Signal Processing, vol. 46, no. 4, pp. 886-902, Apr. 1998.- [11] C. Spence, L.C. Parra, and P. Sajda, "Varying Complexity in Tree-Structured Image Distribution Models,"
IEEE Trans. Image Processing, vol. 15, no. 2, pp. 319-330, Feb. 2006.- [12] J.J. Kivinen, E.B. Sudderth, and M.I. Jordan, "Learning Multiscale Representations of Natural Scenes Using Dirichlet Processes,"
Proc. 11th IEEE Int'l Conf. Computer Vision, 2007.- [13] A.S. Willsky, "Multiresolution Markov Models for Signal and Image Processing,"
Proc. IEEE, vol. 90, no. 8, pp. 1396-1458, Aug. 2002.- [14] J.D. Lafferty, A. McCallum, and F.C.N. Pereira, "Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data,"
Proc. 18th Int'l Conf. Machine Learning, pp. 282-289, 2001.- [15] X. He, R.S. Zemel, and M.Á. Carreira-Perpiñán, "Multiscale Conditional Random Fields for Image Labeling,"
Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, pp. 695-702, 2004.- [16] J. Shotton, J.M. Winn, C. Rother, and A. Criminisi, "TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-Class Object Recognition and Segmentation,"
Proc. European Conf. Computer Vision, pp. 1-15, 2006.- [17] M. Collins, "Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms,"
Proc. ACL-02 Conf. Empirical Methods in Natural Language Processing, pp. 1-8, 2002.- [18] E.M., L. Van Gool, C.K.I. Williams, J. Winn, and A. Zisserman, "The PASCAL Visual Object Classes Challenge 2007 (VOC2007) Results," http://www.pascal-network.org/challenges/ VOC/voc2007/workshopindex.html, 2011.
- [19] L. Zhu, Y. Chen, Y. Lin, C. Lin, and A. Yuille, "Recursive Segmentation and Recognition Templates for 2D Parsing,"
Proc. Advances in Neural Information Processing Systems, 2008.- [20] S. Geman and D. Geman, "Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images,"
IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 6, no. 6, pp. 721-741, Nov. 1984.- [21] A. Blake and A. Zisserman,
Visual Reconstruction. MIT Press, 1987.- [22] D. Mumford and J. Shah, "Optimal Approximations of Piecewise Smooth Functions and Associated Variational Problems,"
Comm. Pure and Applied Math., vol. 42, pp. 577-685, 1989.- [23] D. Geiger and F. Girosi, "Parallel and Deterministic Algorithms from Mrfs: Surface Reconstruction,"
IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 13, no. 5, pp. 401-412, May 1991.- [24] D. Geiger and A. Yuille, "A Common Framework for Image Segmentation,"
Int'l J. Computer Vision, vol. 6, pp. 227-243, 1991.- [25] C. Koch, J. Marroquin, and A. Yuille, "Analog Neuronal Networks in Early Vision,"
Proc. Nat'l Academy of Sciences USA, 1986.- [26] S.C. Zhu, Y.N. Wu, and D. Mumford, "Minimax Entropy Principle and Its Application to Texture Modeling,"
Neural Computation, vol. 9, no. 8, pp. 1627-1660, 1997.- [27] S. Roth and M. Black, "Fields of Experts: A Framework for Learning Image Priors,"
Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, 2005.- [28] S. Konishi, A.L. Yuille, J.M. Coughlan, and S.C. Zhu, "Statistical Edge Detection: Learning and Evaluating Edge Cues,"
IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 25, no. 1, pp. 57-74, Jan. 2003.- [29] A. Rosenfeld, R.A. Hummel, and S.W. Zucker, "Scene Labeling by Relaxation Operations,"
IEEE Trans. Systems, Man, and Cybernetics, vol. 6, no. 6, pp. 420-433, June 1976.- [30] S. Kumar and M. Hebert, "Discriminative Random Fields: A Discriminative Framework for Contextual Interaction in Classification,"
Proc. Ninth IEEE Int'l Conf. Computer Vision, 2003.- [31] S. Kumar and M. Hebert, "A Hierarchical Field Framework for Unified Context-Based Classification,"
Proc. 10th IEEE Int'l Conf. Computer Vision, pp. 1284-1291, 2005.- [32] A. Levin and Y. Weiss, "Learning to Combine Bottom-Up and Top-Down Segmentation,"
Proc. European Conf. Computer Vision, pp. 581-594, 2006.- [33] P.J. Cowans and M. Szummer, "A Graphical Model for Simultaneous Partitioning and Labeling,"
Proc. 10th Int'l Workshop Artificial Intelligence and Statistics, 2005.- [34] S. Zhu and A. Yuille, "Region Competition: Unifying Snake/Balloon, Region Growing and Bayes/MDL/Energy for Multi-Band Image Segmentation,"
IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 18, no. 9, pp. 884-900, Sept. 1996.- [35] H. Chen, Z. Xu, Z. Liu, and S.C. Zhu, "Composite Templates for Cloth Modeling and Sketching,"
Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, pp. 943-950, 2006.- [36] E.B. Sudderth, A.B. Torralba, W.T. Freeman, and A.S. Willsky, "Learning Hierarchical Models of Scenes, Objects, and Parts,"
Proc. IEEE Int'l Conf. Computer Vision, pp. 1331-1338, 2005.- [37] E. Sharon, A. Brandt, and R. Basri, "Fast Multiscale Image Segmentation,"
Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, pp. 1070-1077, 2000.- [38] T. Cour, F. Benezit, and J. Shi, "Spectral Segmentation with Multiscale Graph Decomposition,"
Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, 2005.- [39] E.L. Allwein, R.E. Schapire, and Y. Singer, "Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers,"
J. Machine Learning Research, vol. 1, pp. 113-141, 2000.- [40] Y. Boykov and M.-P. Jolly, "Interactive Graph Cuts for Optimal Boundary and Region Segmentation of Objects in N-D Images,"
Proc. IEEE Int'l Conf. Computer Vision, pp. 105-112, 2001.- [41] T. Lee and D. Mumford, "Hierarchical Bayesian Inference in the Visual Cortex,"
J. Optical Soc. Am., vol. 20, pp. 1434-1448, 2003.- [42] B. Taskar, D. Klein, M. Collins, D. Koller, and C. Manning, "Max-Margin Parsing,"
Proc. Ann. Meeting Assoc. for Computational Linguistics Conf. Empirical Methods in Natural Language Processing, 2004.- [43] L. Zhu, Y. Chen, X. Ye, and A.L. Yuille, "Structure-Perceptron Learning of a Hierarchical Log-Linear Model,"
Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, 2008.- [44] J. Verbeek and B. Triggs, "Region Classification with Markov Field Aspect Models,"
Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, 2007.- [45] Z. Tu, "Auto-Context and Its Application to High-Level Vision Tasks,"
Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, 2008.- [46] J. Verbeek and B. Triggs, "Scene Segmentation with CRFs Learned from Partially Labeled Images,"
Proc. Advances in Neural Information Processing Systems, vol. 20, 2008.- [47] J.J. Lim, P. Arbel´aez, C. Gu, and J. Malik, "Context by Region Ancestry,"
Proc. 12th IEEE Int'l Conf. Computer Vision, 2009.- [48] L. Ladicky, C. Russell, P. Kohli, and P. Torr, "Associative Hierarchical CRFS for Object Class Image Segmentation,"
Proc. IEEE Int'l Conf. Computer Vision, 2009.- [49] G. Csurka and F. Perronnin, "A Simple High Performance Approach to Semantic Segmentation,"
Proc. British Machine Vision Conf., 2008. |