This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Bottom-Up/Top-Down Image Parsing with Attribute Grammar
January 2009 (vol. 31 no. 1)
pp. 59-73
Feng Han, University of California, Los Angeles, Los Angeles
Song-Chun Zhu, University of California, Los Angeles University of California, Los Angeles, Los Angeles Los Angeles
This paper presents a simple attribute graph grammar as a generative representation for made-made scenes, such as buildings, hallways, kitchens, and living rooms, and studies an effective top-down/bottom-up inference algorithm for parsing images in the process of maximizing a Bayesian posterior probability or equivalently minimizing a description length (MDL). Given an input image, the inference algorithm computes (or constructs) a parse graph, which includes a parse tree for the hierarchical decomposition and a number of spatial constraints. In the inference algorithm, the bottom-up step detects an excessive number of rectangles as weighted candidates, which are sorted in certain order and activate top-down predictions of occluded or missing components through the grammar rules. In the experiment, we show that the grammar and top-down inference can largely improve the performance of bottom-up detection.

[1] S. Baumann, “A Simplified Attribute Graph Grammar for High-Level Music Recognition,” Proc. Third Int'l Conf. Document Analysis and Recognition, 1995.
[2] R. Brooks, “Symbolic Reasoning Among 3D Models and 2D Images,” Stanford AIM-343, STAN-CS-81-861, 1981.
[3] H. Chen, Z.J. Xu, and S.C. Zhu, “Composite Templates for Cloth Modeling and Sketching,” Proc. IEEE Int'l Conf. Computer Vision and Pattern Recognition, June 2006.
[4] A.R. Dick, P.H.S. Torr, and R. Cipolla, “Modeling and Interpretation of Architecture from Several Images,” Int'l J. Computer Vision, vol. 60, no. 2, pp. 111-134, 2004.
[5] S. Dickinson, A. Pentland, and A. Rosenfeld, “3-D Shape Recovery Using Distributed Aspect Matching,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 14, no. 2, pp. 174-198, Feb. 1992.
[6] T. Fan, G. Medioni, and R. Nevatia, “Recognizing 3-D Objects Using Surface Descriptions,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 11, no. 11, pp. 1140-1157, Nov. 1989.
[7] M.A. Fischler and R.C. Bolles, “Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography,” Comm. ACM, vol. 24, pp. 381-395, 1981.
[8] K.S. Fu, Syntactic Pattern Recognition and Applications. Prentice Hall, 1981.
[9] C.E. Guo, S.C. Zhu, and Y.N. Wu, “A Mathematical Theory of Primal Sketch and Sketchability,” Proc. Ninth IEEE Int'l Conf. Computer Vision, 2003.
[10] C.E. Guo, S.C. Zhu, and Y.N. Wu, “Primal Sketch: Integrating Texture and Structure,” Computer Vision and Image Understanding, vol. 106, no. 1, pp. 5-19, Apr. 2007.
[11] A. Hanson and E. Riseman, “Visions: A Computer System for Interpreting Scenes,” Computer Vision Systems, 1978.
[12] K. Huang, W. Hong, and Y. Ma, “Symmetry-Based Photo Editing,” Proc. First IEEE Workshop Higher-Level Knowledge in 3D Modeling & Motion Analysis, 2003.
[13] V. Hwang and T. Matsuyama, “SIGMA: A Framework for Image Understanding: Integration of Bottom-Up and Top-Down Analyses,” Proc. Int'l Joint Conf. Artificial Intelligence '85, pp. 908-915, 1985.
[14] V. Hwang, L.S. Davis, and T. Matsuyama, “Hypothesis Integration in Image Understanding Systems,” Computer Vision, Graphics, and Image Processing, vol. 36, nos. 2/3, pp. 321-371, 1986.
[15] S. Ioffe and D. Forsyth, “Probabilistic Methods for Finding People,” Int'l J. Computer Vision, vol. 43, no. 1, pp. 45-68, 2001.
[16] Y. Jin and S. Geman, “Context and Hierarchy in a Probabilistic Image Model,” Proc. IEEE Int'l Conf. Computer Vision and Pattern Recognition, June 2006.
[17] D. Lagunovsky and S. Ablameyko, “Straight-Line-Based Primitive Extraction in Grey-Scale Object Recognition,” Pattern Recognition Letters, vol. 20, no. 10, pp. 1005-1014, Oct. 1999.
[18] A. Levinshtein, C. Sminchisescu, and S.J. Dickinson, “Learning Hierarchical Shape Models from Examples,” Proc. Fifth Int'l Workshop Energy Minimization Methods in Computer Vision and Pattern Recognition, 2005.
[19] C. Lin and R. Nevatia, “Building Detection and Description from a Single Intensity Image,” Computer Vision and Image Understanding, vol. 72, no. 2, pp. 101-121, 1998.
[20] S. Mallat and Z. Zhang, “Matching Pursuit with Time-Frequency Dictionaries,” IEEE Trans. Signal Processing, vol. 41, no. 12, pp.3397-3415, 1993.
[21] W. Mann and T. Binford, “Successor: Interpretation Overview and Constraint System,” Proc. Image Understanding Workshop, pp. 1505-1518, 1996.
[22] D. McKeown, W. Harvey, and L. Wixson, “Automating Knowledge Acquisition for Aerial Image Interpretation,” Computer Vision, Graphics, and Image Processing, vol. 46, no. 1, pp. 37-81, 1989.
[23] S. Munder and D. Gavrila, “An Experimental Study on Pedestrian Classification,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 28, no. 11, Nov. 2006.
[24] Y. Ohta, T. Kanade, and T. Sakai, “An Analysis System for Scenes Containing Objects with Substructures,” Proc. Fourth Int'l Conf. Pattern Recognition, pp. 752-754, 1978.
[25] Y. Ohta, Knowledge-Based Interpretation of Outdoor Natural Color Scenes. Pitman, 1985.
[26] I. Pollak, J.M. Siskind, M.P. Harper, and C.A. Bouman, “Parameter Estimation for Spatial Random Trees Using the EM Algorithm,” Proc. IEEE Int'l Conf. Image Processing, 2003.
[27] I. Pollak, J.M. Siskind, M.P. Harper, and C.A. Bouman, “Modeling and Estimation of Spatial Random Trees with Application to Image Classification,” Proc. 28th IEEE Int'l Conf. Acoustics, Speech, and Signal Processing, 2003.
[28] J. Porway, B. Yao, and S.C. Zhu, “Learning Compositional Models for Object Categories from Small Sample Sets,” Object Categorization: Computer and Human Vision Perspectives, S. Dickinson et al., eds., Cambridge Univ. Press, 2009.
[29] J. Rekers and A. Schürr, “Defining and Parsing Visual Languages with Layered Graph Grammars,” J. Visual Language and Computing, Sept. 1996.
[30] K. Siddiqi, A. Shokoufandeh, S.J. Dickinson, and S.W. Zucker, “Shock Graphs and Shape Matching,” Int'l J. Computer Vision, vol. 35, no. 1, pp. 13-32, 1999.
[31] J.M. Siskind, J. Sherman, I. Pollak, M.P. Harper, and C.A. Bouman, “Spatial Random Tree Grammars for Modeling Hierarchal Structure in Images,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 29, no. 9, pp. 1504-1519, Sept. 2007.
[32] W.-B. Tao, J.-W. Tian, and J. Liu, “A New Approach to Extract Rectangle Building from Aerial Urban Images,” Proc. Sixth Int'l Conf. Signal Processing, pp. 143-146, 2002.
[33] Z.W. Tu, X.R. Chen, A.L. Yuille, and S.C. Zhu, “Image Parsing: Unifying Segmentation, Detection and Recognition,” Int'l J. Computer Vision, vol. 63, no. 2, pp. 113-140, 2005.
[34] W. Wang, I. Pollak, T.-S. Wong, C.A. Bouman, M.P. Harper, and J.M. Siskind, “Hierarchical Stochastic Image Grammars for Classification and Segmentation,” IEEE Trans. Image Processing, vol. 15, no. 10, pp. 3033-3052, Oct. 2006.
[35] T.F. Wu, G.S. Xia, and S.C. Zhu, “Compositional Boosting for Computing Hierarchical Image Structures,” Proc. IEEE. Int'l Conf. Computer Vision and Pattern Recognition, June 2007.
[36] F.C You and K.S. Fu, “A Syntactic Approach to Shape Recognition Using Attributed Grammars,” IEEE Trans. Systems, Man, and Cybernetics, vol. 9, pp. 334-345, 1979.
[37] F.C. You and K.S. Fu, “Attributed Grammar: A Tool for Combining Syntactic and Statistical Approaches to Pattern Recognition,” IEEE Trans. Systems, Man, and Cybernetics, vol. 10, pp. 873-885, 1980.
[38] W. Zhang and J. Kosecka, “Extraction, Matching and Pose Recovery Based on Dominant Rectangular Structures,” Proc. First IEEE Workshop Higher-Level Knowledge in 3D Modeling and Motion Analysis, 2003.
[39] S.C. Zhu, R. Zhang, and Z.W. Tu, “Integrating Top-Down/Bottom-Up for Object Recognition by DDMCMC,” Proc. IEEE Int'l Conf. Computer Vision and Pattern Recognition, 2000.
[40] S.C. Zhu and A.L. Yuille, “FORMS: A Flexible Object Recognition and Modeling System,” Int'l J. Computer Vision, vol. 20, no. 3, pp.187-212, 1996.
[41] S.C. Zhu, Y.N. Wu, and D. Mumford, “Minimax Entropy Principle and Its Application to Texture Modeling,” Neural Computation, vol. 9, pp. 1627-1660, 1997.
[42] S.C. Zhu and D. Mumford, “A Stochastic Grammar of Images,” Foundations and Trends in Computer Graphics and Vision, vol. 2, no. 4, pp. 259-362, 2006.
[43] Y. Zhu, B. Carragher, F. Mouche, and C. Potter, “Automatic Particle Detection through Efficient Hough Transforms,” IEEE Trans. Medical Imaging, vol. 22, no. 9, pp. 1053-1062, Sept. 2003.
[44] Z.Y. Yao, X. Yang, and S.C. Zhu, “Introduction to a Large Scale General Purpose Groundtruth Dataset: Methodology, Annotation Tool, and Benchmarks,” Proc. Sixth Int'l Conf. EMMCVPR, Aug. 2007.

Index Terms:
Pattern analysis, Algorithms, Statistical
Citation:
Feng Han, Song-Chun Zhu, "Bottom-Up/Top-Down Image Parsing with Attribute Grammar," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 1, pp. 59-73, Jan. 2009, doi:10.1109/TPAMI.2008.65
Usage of this product signifies your acceptance of the Terms of Use.