Issue No.07 - July (2013 vol.35)
pp: 1744-1756
O. Teboul , MAS Lab., Ecole Centrale Paris, Chatenay-Malabry, France
I. Kokkinos , Ecole Centrale Paris-INRIA Saclay, Chatenay-Malabry, France
L. Simon , GREYC, Ecole Nat. Super. d'Ing. de Caen, Caen, France
P. Koutsourakis , Ecole Centrale Paris, Univ. of Crete, Chatenay-Malabry, France
N. Paragios , Ecole Centrale Paris-Ecole des Ponts-ParisTech-INRIA Saclay, Chatenay-Malabry, France
In this paper, we use shape grammars (SGs) for facade parsing, which amounts to segmenting 2D building facades into balconies, walls, windows, and doors in an architecturally meaningful manner. The main thrust of our work is the introduction of reinforcement learning (RL) techniques to deal with the computational complexity of the problem. RL provides us with techniques such as Q-learning and state aggregation which we exploit to efficiently solve facade parsing. We initially phrase the 1D parsing problem in terms of a Markov Decision Process, paving the way for the application of RL-based tools. We then develop novel techniques for the 2D shape parsing problem that take into account the specificities of the facade parsing problem. Specifically, we use state aggregation to enforce the symmetry of facade floors and demonstrate how to use RL to exploit bottom-up, image-based guidance during optimization. We provide systematic results on the Paris building dataset and obtain state-of-the-art results in a fraction of the time required by previous methods. We validate our method under diverse imaging conditions and make our software and results available online.
Grammar, Shape, Markov processes, Learning, Equations, Optimization, Image segmentation, Markov decision processes, Image arsing, shape grammar, reinforcement learning, semantic segmentation, data-driven exploration
O. Teboul, I. Kokkinos, L. Simon, P. Koutsourakis, N. Paragios, "Parsing Facades with Shape Grammars and Reinforcement Learning", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.35, no. 7, pp. 1744-1756, July 2013, doi:10.1109/TPAMI.2012.252
[1] P. Wonka, M. Wimmer, F.X. Sillion, and W. Ribarsky, "Instant Architecture," ACM Trans. Graphics, vol. 22, no. 3, pp. 669-677, 2003.
[2] P. Müller, P. Wonka, S. Haegler, A. Ulmer, and L. Van Gool, "Procedural Modeling of Buildings," ACM Trans. Graphics, vol. 25, no. 3, pp. 614-623, 2006.
[3] M.I. Schlesinger and V. Hlavác, Ten Lectures on Statistical and Structural Pattern Recognition (Computational Imaging and Vision). Springer, 2002.
[4] O. Teboul, L. Simon, P. Koutsourakis, and N. Paragios, "Segmentation of Building Facades Using Procedural Shape Priors," Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 3105-3112, 2010.
[5] G. Stiny and J. Gips, "Shape Grammars and the Generative Specification of Painting and Sculpture," Information Processing 71, C.V. Freiman, ed., pp. 1460-1465, North-Holland, 1972.
[6] G. Stiny and W.J. Mitchell, "The Palladian Grammar," Environment and Planning B: Planning and Design, vol. 5, pp. 5-18, 1978.
[7] S. Havemann, "Generative Mesh Modeling," PhD dissertation, Braunschweig, 2005.
[8] S.T. Teoh, "Generalized Descriptions for the Procedural Modeling of Ancient East Asian Buildings," Proc. Fifth Eurographics Conf. Computational Aesthetics in Graphics, Visualization, and Imaging, 2009.
[9] E. Whiting, J. Ochsendorf, and F. Durand, "Procedural Modeling of Structurally-Sound Masonry Buildings," ACM Trans. Graphics, vol. 28, no. 5, p. 1, Dec. 2009.
[10] S.-C. Zhu and D. Mumford, "A Stochastic Grammar of Images," Foundations and Trends in Computer Graphics and Vision, vol. 2, no. 4, pp. 259-362, 2006.
[11] F. Alegre and F. Dellaert, "A Probabilistic Approach to the Semantic Interpretation of Building Facades," Proc. Int'l Workshop Vision Techniques Applied to the Rehabilitation of City Centres, 2004.
[12] N. Ripperda and C. Brenner, "Application of a Formal Grammar to Facade Reconstruction in Semiautomatic and Automatic Environments," Proc. 12th AGILE Conf. GIScience, pp. 1-12, 2009.
[13] R.S. Sutton and A.G. Barto, Reinforcement Learning: An Introduction. MIT Press, 1998.
[14] S. Thrun, "The Role of Exploration in Reinforcement Learning," Handbook for Intelligent Control: Neural, Fuzzy and Adaptive Approaches, D.A White and D.A. Sofge, eds., Van Nostrand Reinhold, 1992.
[15] C. Watkins, "Learning from Delayed Rewards," PhD dissertation, Cambridge Univ., 1989.
[16] A. Barto and S. Mahadevan, "Recent Advances in Hierarchical Reinforcement Learning," Discrete Event Dynamic Systems, vol. 13, no. 4, pp. 341-379, 2003.
[17] T.G. Dietterich, "Hierarchical Reinforcement Learning with MAXQ," J. Artificial Intelligence Research, vol. 13, pp. 227-303, 2000.
[18] B. Marthi, S.J. Russell, and J. Wolfe, "Angelic Semantics for High-Level Actions," Proc. 17th Int'l Conf. Automated Planning and Scheduling, 2007.
[19] G. Neu and C. Szepesvári, "Training Parsers by Inverse Reinforcement Learning," Machine Learning, vol. 77, pp. 303-337, 2009.
[20] S.-C. Zhu, R. Zhang, and Z.W. Tu, "Integrating Top-Down/Bottom-Up for Object Recognition by Data-Driven Markov Chain Monte Carlo," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2000.
[21] S.C. Lee and R. Nevatia, "Extraction and Integration of Window in a 3D Building Model from Ground View Images," Proc. IEEE Conf. Computer Vision and Pattern Recognition, vol. 1, 2004.
[22] L. Breiman, "Random Forests," Machine Learning, vol. 45, no. 1, pp. 5-32, 2001.
[23] A. Blake, C. Rother, M. Brown, P. Perez, and P. Torr, "Interactive Image Segmentation Using an Adaptative GMMRF Model," Proc. European Conf. Computer Vision, 2004.
[24] C. Liu and A. Gagalowicz, "Image-Based Modeling of Haussmannian Facades," Int'l J. Virtual Reality, vol. 9, no. 1, pp. 13-18, 2010.
[25] R. Tylecek and R. Sára, "Modeling Symmetries for Stochastic Structural Recognition," Proc. IEEE Int'l Conf. Computer Vision Workshop, 2011.
[26] Y. Ohta, T. Kanade, and T. Sakai, "An Analysis System for Scenes Containing Objects with Substructures," Proc. Int'l Joint Conf. Pattern Recognitions, vol. 1, 1978.
[27] A.C. Berg, F. Grabler, and J. Malik, "Parsing Images of Architectural Scenes," Proc. IEEE Int'l Conf. Computer Vision, pp. 1-8, 2007.
[28] J. Shotton, J. Winn, C. Rother, and A. Criminisi, "TextonBoost for Image Understanding: Multi-Class Object Recognition and Segmentation by Jointly Modeling Texture, Layout, and Context," Int'l J. Computer Vision, vol. 81, no. 1, pp. 2-23, Dec. 2007.
[29] J. Tighe and S. Lazebnik, "SuperParsing: Scalable Nonparametric Image Parsing with Superpixels," Proc. European Conf. Computer Vision, pp. 352-365, 2010.
[30] Z.W. Tu and S.-C. Zhu, "Image Segmentation by Data-Driven Markov Chain Monte Carlo," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 5, pp. 657-673, May 2002.
[31] Y. Chen, A. Yuille, and L.L. Zhu, "Unsupervised Learning of a Probabilistic Grammar for Object Detection and Parsing Unsupervised Learning of a Probabilistic Grammar for Object Detection and Parsing," Science and Technology, 2007.
[32] T. Leung and J. Malik, "Detecting, Localizing and Grouping Repeated Scene Elements from an Image," Proc. European Conf. Computer Vision, pp. 546-555, 1996.
[33] Y. Liu, Y. Tsin, and W. Lin, "The Promise and Perils of Near-Regular Texture," Int'l J. Computer Vision, vol. 62, no. 1, pp. 145-159, 2005.
[34] B. Weber, P. Müller, P. Wonka, and M. Gross, "Interactive Geometric Simulation of 4D Cities," Computer Graphics Forum, vol. 28, no. 2, pp. 481-492, 2009.
[35] S. Reznik and H. Mayer, "Implicit Shape Models, Model Selection, and Plane Sweeping for 3D Facade Interpretation," Photogrammetric Image Analysis, p. 173, 2007.
[36] P. Musialski, P. Wonka, M. Recheis, S. Maierhofer, and W. Purgathofer, "Symmetry-Based Facade Repair," Proc. Vision, Modeling, and Visualization Workshop, 2009.
[37] M. Park, K. Brocklehurst, R.T. Collins, and Y. Liu, "Translation-Symmetry-Based Perceptual Grouping with Applications to Urban Scenes," Proc. Asian Conf. Computer Vision, 2010.
[38] P. Koutsourakis, L. Simon, O. Teboul, G. Tziritas, and N. Paragios, "Single View Reconstruction Using Shape Grammars for Urban Environments," Proc. IEEE Int'l Conf. Computer Vision, 2009.
[39] P. Müller, G. Zeng, P. Wonka, and L. Van Gool, "Image-Based Procedural Modeling of Facades," ACM Trans. Graphics, vol. 26, no. 3, p. 85, 2007.
[40] C.-H. Shen, S.-S. Huang, H. Fu, and S.-M. Hu, "Adaptive Partitioning of Urban Facades," ACM Trans. Graphics, vol. 30, no. 6, p. 184, 2011.
[41] P. Musialski, M. Wimmer, and P. Wonka, "Interactive Coherence-Based Façade Modeling," Computer Graphics Forum, vol. 31, no. 2, pp. 661-670, 2012.
[42] J. Cech and R. Sara, "Windowpane Detection Based on Maximum Aposteriori Probability Labeling," Proc. Int'l Workshop Combinatorial Image Analysis, 2008.
[43] J. Cech and R. Sara, "Languages for Constrained Binary Segmentation Based on Maximum A Posteriori Probability Labeling," Int'l J. Imaging Systems and Technology, vol. 19, no. 2, pp. 69-79, June 2009.
[44] L. Simon, O. Teboul, P. Koutsourakis, L. Van Gool, and N. Paragios, "Parameter-Free/Pareto-Driven Procedural 3D Reconstruction of Buildings from Ground-Level Sequences," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2012.