The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.09 - September (2010 vol.32)
pp: 1673-1687
Vladimir Nedović , University of Amsterdam, Amsterdam
Arnold W.M. Smeulders , University of Amsterdam, Amsterdam
André Redert , Philips Research Laboratories Eindhoven, Eindhoven
Jan-Mark Geusebroek , University of Amsterdam, Amsterdam
ABSTRACT
Reconstruction of 3D scene geometry is an important element for scene understanding, autonomous vehicle and robot navigation, image retrieval, and 3D television. We propose accounting for the inherent structure of the visual world when trying to solve the scene reconstruction problem. Consequently, we identify geometric scene categorization as the first step toward robust and efficient depth estimation from single images. We introduce 15 typical 3D scene geometries called stages, each with a unique depth profile, which roughly correspond to a large majority of broadcast video frames. Stage information serves as a first approximation of global depth, narrowing down the search space in depth estimation and object localization. We propose different sets of low-level features for depth estimation, and perform stage classification on two diverse data sets of television broadcasts. Classification results demonstrate that stages can often be efficiently learned from low-dimensional image representations.
INDEX TERMS
Scene geometry, scene structure, depth estimation, scene categorization, stages.
CITATION
Vladimir Nedović, Arnold W.M. Smeulders, André Redert, Jan-Mark Geusebroek, "Stages as Models of Scene Geometry", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.32, no. 9, pp. 1673-1687, September 2010, doi:10.1109/TPAMI.2009.174
REFERENCES
[1] W. Richards, A. Jepson, and J. Feldman, "Priors, Preferences and Categorical Percepts," Perception as Bayesian Inference, D. Knill and W. Richards, eds., chapter 3, pp. 80-111, Cambridge Univ. Press, 1996.
[2] J. Uijlings, A.W.M. Smeulders, and R.J.H. Scha, "What Is the Spatial Extent of an Object?" Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2009.
[3] B. Horn and M. Brooks, Shape from Shading. MIT Press, 1989.
[4] L. Matthies, T. Kanade, and R. Szeliski, "Kalman Filter-Based Algorithms for Estimating Depth from Image Sequences," Int'l J. Computer Vision, vol. 3, no. 3, pp. 209-238, 1989.
[5] S. Barnard, "A Stochastic Approach to Stereo Vision," Proc. Fifth Nat'l Conf. Artificial Intelligence, 1986.
[6] V. Nedović, A.W.M. Smeulders, A. Redert, and J.M. Geusebroek, "Depth Information by Stage Classification," Proc. IEEE Int'l Conf. Computer Vision, 2007.
[7] D. Hoiem, A.A. Efros, and M. Hebert, "Geometric Context from a Single Image," Proc. IEEE Int'l Conf. Computer Vision, 2005.
[8] A. Torralba and A. Oliva, "Depth Estimation from Image Structure," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 9, pp. 1226-1238, Sept. 2002.
[9] A. Saxena, S. Chung, and A. Ng, "Learning Depth from Single Monocular Images," Proc. Neural Information Processing Systems, 2005.
[10] E.B. Sudderth, A. Torralba, W.T. Freeman, and A.S. Willsky, "Depth from Familiar Objects: A Hierarchical Model for 3D Scenes," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2006.
[11] D. Hoiem, A. Stein, A.A. Efros, and M. Hebert, "Recovering Occlusion Boundaries from a Single Image," Proc. IEEE Int'l Conf. Computer Vision, 2007.
[12] S.X. Yu, H. Zhang, and J. Malik, "Inferring Spatial Layout from a Single Image via Depth-Ordered Grouping," Proc. Sixth IEEE Workshop Perceptual Organization in Computer Vision, 2008.
[13] D. Hoiem, A.A. Efros, and M. Hebert, "Putting Objects in Perspective," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2006.
[14] B.C. Russell and A. Torralba, "Building a Database of 3D Scenes from User Annotations," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2009.
[15] A. Saxena, S. Chung, and A. Ng, "3D Depth Reconstruction from a Single Still Image," Int'l J. Computer Vision, vol. 76, no. 1, pp. 53-69, 2008.
[16] E. Delage, H. Lee, and A. Ng, "A Dynamic Bayesian Network Model for Autonomous 3D Reconstruction from a Single Indoor Image," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2006.
[17] Z. Yang and D. Purves, "A Statistical Explanation of Visual Space," Nature Neuroscience, vol. 6, no. 6 pp. 632-640, June 2003.
[18] R. Bajcsy and L. Lieberman, "Texture Gradient as a Depth Cue," Computer Graphics Image Processing, vol. 5, pp. 52-67, 1976.
[19] T. Kanade, "Recovery of the Three-Dimensional Shape of an Object from a Single View," Artificial Intelligence, vol. 17, nos. 1-3, pp. 409-460, 1981.
[20] H.G. Barrow and J.M. Tenenbaum, "Interpreting Line Drawings as Three-Dimensional Surfaces," Artificial Intelligence, vol. 17, nos. 1-3, pp. 75-116, 1981.
[21] D. Hoiem, A.A. Efros, and M. Hebert, "Recovering Surface Layout from an Image," Int'l J. Computer Vision, vol. 75, no. 1, pp. 151-172, 2007.
[22] O. Barinova, V. Konushin, A. Yakubenko, K. Lee, H. Lim, and A. Konushin, "Fast Automatic Single-View 3D Reconstruction of Urban Scenes," Proc. European Conf. Computer Vision, 2008.
[23] X. Liu, O. Veksler, and J. Samarabandu, "Graph Cut with Ordering Constraints on Labels and Its Applications," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008.
[24] M. Szummer and R.W. Picard, "Indoor-Outdoor Image Classification," Proc. IEEE Int'l Workshop Content-Based Access of Image and Video Databases, 1998.
[25] A. Payne and S. Singh, "A Benchmark for Indoor/Outdoor Scene Classification," Proc. Int'l Conf. Advances in Pattern Recognition, 2005.
[26] A. Vailaya, A. Jain, and H. Zhang, "On Image Classification: City Images versus Landscapes," Pattern Recognition, vol. 31, no. 12, pp. 1921-1935, 1998.
[27] A. Oliva and A. Torralba, "Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope," Int'l J. Computer Vision, vol. 42, no. 3, pp. 145-175, 2001.
[28] L. Fei-Fei and P. Perona, "A Bayesian Hierarchical Model for Learning Natural Scene Categories," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2005.
[29] P. Quelhas, F. Monay, J.-M. Odobez, D. Gatica-Perez, T. Tuytelaars, and L. Van Gool, "Modeling Scenes with Local Descriptors and Latent Aspects," Proc. IEEE Int'l Conf. Computer Vision, 2005.
[30] A. Bosch, A. Zisserman, and X. Munoz, "Scene Classification via pLSA," Proc. European Conf. Computer Vision, 2006.
[31] J.C. van Gemert, J.M. Geusebroek, C.J. Veenman, C.G.M. Snoek, and A.W.M. Smeulders, "Robust Scene Categorization by Learning Image Statistics in Context," Proc. IEEE Conf. Computer Vision and Pattern Recognition Workshop Semantic Learning Applications in Multimedia, 2006.
[32] S. Lazebnik, C. Schmid, and J. Ponce, "Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2006.
[33] J. Huang and D. Mumford, "Statistics of Natural Images and Models," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 1999.
[34] D.L. Ruderman and W. Bialek, "Statistics of Natural Images: Scaling in the Woods," Physical Rev. Letters, vol. 73, pp. 814-817, 1994.
[35] M. Yeung, B. Liu, and B.-L. Yeo, "Extracting Story Units from Long Programs for Video Browsing and Navigation," Proc. Int'l Conf. Multimedia Computing and Systems, 1996.
[36] A.F. Smeaton, P. Over, and W. Kraaij, "Evaluation Campaigns and TRECVid," Proc. ACM Int'l Workshop Multimedia Information Retrieval, 2006.
[37] C.G.M. Snoek et al., "The MediaMill TRECVID 2008 Semantic Video Search Engine," Proc. Sixth TRECVID Workshop, 2008.
[38] J. Sivic, B. Kaneva, A. Torralba, S. Avidan, and W.T. Freeman, "Creating and Exploring a Large Photorealistic Virtual Space," Proc. First IEEE Workshop Internet Vision, 2008.
[39] J.M. Geusebroek and A.W.M. Smeulders, "A Six-Stimulus Theory for Stochastic Texture," Int'l J. Computer Vision, vol. 62, nos. 1/2, pp. 7-16, 2005.
[40] C.G.M. Snoek et al., "The Semantic Pathfinder: Using an Authoring Metaphor for Generic Multimedia Indexing," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 28, no. 10, pp. 1678-1689, Oct. 2006.
[41] S. Narasimhan and S. Nayar, "Vision and the Atmosphere," Int'l J. Computer Vision, vol. 48, no. 3, pp. 233-254, 2002.
[42] S.E. Palmer, Vision Science. MIT Press, 1999.
[43] F. Cozman and E. Krotkov, "Depth from Scattering," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 1997.
[44] J.M. Geusebroek, A.W.M. Smeulders, and J. van de Weijer, "Fast Anisotropic Gauss Filtering," IEEE Trans. Image Processing, vol. 12, no. 8, pp. 938-943, Aug. 2003.
[45] R. Duda, P. Hart, and D. Stork, Pattern Classification, second ed. John Wiley and Sons, 2000.
[46] C.-C. Chang and C.-J. Lin, LIBSVM: A Library for Support Vector Machines, http://www.csie.ntu.edu.tw/~cjlinlibsvm, 2001.
17 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool