The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.05 - May (2009 vol.31)
pp: 824-840
Ashutosh Saxena , Stanford University, CA
Min Sun , Princeton University, NJ
Andrew Y. Ng , Stanford University, CA
ABSTRACT
We consider the problem of estimating detailed 3D structure from a single still image of an unstructured environment. Our goal is to create 3D models that are both quantitatively accurate as well as visually pleasing. For each small homogeneous patch in the image, we use a Markov Random Field (MRF) to infer a set of "plane parameters” that capture both the 3D location and 3D orientation of the patch. The MRF, trained via supervised learning, models both image depth cues as well as the relationships between different parts of the image. Other than assuming that the environment is made up of a number of small planes, our model makes no explicit assumptions about the structure of the scene; this enables the algorithm to capture much more detailed 3D structure than does prior art and also give a much richer experience in the 3D flythroughs created using image-based rendering, even for scenes with significant nonvertical structure. Using this approach, we have created qualitatively correct 3D models for 64.9 percent of 588 images downloaded from the Internet. We have also extended our model to produce large-scale 3D models from a few images.
INDEX TERMS
Machine learning, monocular vision, learning depth, vision and scene understanding, scene analysis, depth cues.
CITATION
Ashutosh Saxena, Min Sun, Andrew Y. Ng, "Make3D: Learning 3D Scene Structure from a Single Still Image", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.31, no. 5, pp. 824-840, May 2009, doi:10.1109/TPAMI.2008.132
REFERENCES
[1] A. Saxena, M. Sun, and A.Y. Ng, “Learning 3-D Scene Structure from a Single Still Image,” Proc. ICCV Workshop 3D Representation for Recognition, 2007.
[2] A. Saxena, M. Sun, and A.Y. Ng, “3-D Reconstruction from Sparse Views Using Monocular Vision,” Proc. ICCV Workshop Virtual Representations and Modeling of Large-Scale Environments, 2007.
[3] A. Saxena, M. Sun, and A.Y. Ng, “Make3D: Depth Perception from a Single Still Image,” Proc. 23rd AAAI Conf. Artificial Intelligence, 2008.
[4] D. Scharstein and R. Szeliski, “A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms,” Int'l J. Computer Vision, vol. 47, 2002.
[5] D.A. Forsyth and J. Ponce, Computer Vision: A Modern Approach. Prentice Hall, 2003.
[6] A. Saxena, S.H. Chung, and A.Y. Ng, “Learning Depth from Single Monocular Images,” Proc. 19th Ann. Conf. Neural Information Processing Systems, vol. 18, 2005.
[7] J. Michels, A. Saxena, and A.Y. Ng, “High Speed Obstacle Avoidance Using Monocular Vision and Reinforcement Learning,” Proc. 22nd Int'l Conf. Machine Learning, 2005.
[8] E. Delage, H. Lee, and A.Y. Ng, “A Dynamic Bayesian Network Model for Autonomous 3D Reconstruction from a Single Indoor Image,” Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, 2006.
[9] D. Hoiem, A. Efros, and M. Herbert, “Geometric Context from a Single Image,” Proc. 10th Int'l Conf. Computer Vision, 2005.
[10] P. Felzenszwalb and D. Huttenlocher, “Efficient Graph-Based Image Segmentation,” Int'l J. Computer Vision, vol. 59, 2004.
[11] D. Hoiem, A. Efros, and M. Hebert, “Putting Objects in Perspective,” Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, 2006.
[12] R. Zhang, P. Tsai, J. Cryer, and M. Shah, “Shape from Shading: A Survey,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 21, no. 8, pp. 690-706, Aug. 1999.
[13] A. Maki, M. Watanabe, and C. Wiles, “Geotensity: Combining Motion and Lighting for 3D Surface Reconstruction,” Int'l J. Computer Vision, vol. 48, no. 2, pp. 75-90, 2002.
[14] J. Malik and R. Rosenholtz, “Computing Local Surface Orientation and Shape from Texture for Curved Surfaces,” Int'l J. Computer Vision, vol. 23, no. 2, pp. 149-168, 1997.
[15] T. Lindeberg and J. Garding, Shape from Texture from a Multi-Scale Perspective, 1993.
[16] T. Nagai, T. Naruse, M. Ikehara, and A. Kurematsu, “HMM-Based Surface Reconstruction from Single Images,” Proc. IEEE Int'l Conf. Image Processing, vol. 2, 2002.
[17] T. Hassner and R. Basri, “Example Based 3D Reconstruction from Single 2D Images,” Proc. CVPR Workshop Beyond Patches, 2006.
[18] F. Han and S.-C. Zhu, “Bayesian Reconstruction of 3D Shapes and Scenes from a Single Image,” Proc. ICCV Workshop Higher-Level Knowledge in 3D Modeling Motion Analysis, 2003.
[19] A. Criminisi, I. Reid, and A. Zisserman, “Single View Metrology,” Int'l J. Computer Vision, vol. 40, pp. 123-148, 2000.
[20] A. Torralba and A. Oliva, “Depth Estimation from Image Structure,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 9, pp. 1-13, Sept. 2002.
[21] A. Saxena, S.H. Chung, and A.Y. Ng, “3-D Depth Reconstruction from a Single Still Image,” Int'l J. Computer Vision, 2007.
[22] A. Saxena, J. Schulte, and A.Y. Ng, “Depth Estimation Using Monocular and Stereo Cues,” Proc. 20th Int'l Joint Conf. Artificial Intelligence, 2007.
[23] E. Delage, H. Lee, and A. Ng, “Automatic Single-Image 3D Reconstructions of Indoor Manhattan World Scenes,” Proc. 12th Int'l Symp. Robotics Research, 2005.
[24] K. Murphy, A. Torralba, and W. Freeman, “Using the Forest to See the Trees: A Graphical Model Relating Features, Objects, and Scenes,” Proc. 17th Ann. Conf. Neural Information Processing Systems, vol. 16, 2003.
[25] Y. Lu, J. Zhang, Q. Wu, and Z. Li, “A Survey of Motion-Parallax-Based 3-D Reconstruction Algorithms,” IEEE Trans. Systems, Man, and Cybernetics, Part C, vol. 34, pp. 532-548, 2004.
[26] J. Loomis, “Looking Down Is Looking Up,” Nature News and Views, vol. 414, pp. 155-156, 2001.
[27] B.A. Wandell, Foundations of Vision. Sinauer Assoc., 1995.
[28] D.R. Martin, C.C. Fowlkes, and J. Malik, “Learning to Detect Natural Image Boundaries Using Local Brightness, Color and Texture Cues,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 26, no. 5, pp. 530-549, May 2004.
[29] R. Koch, M. Pollefeys, and L.V. Gool, “Multi Viewpoint Stereo from Uncalibrated Video Sequences,” Proc. Fifth European Conf. Computer Vision, 1998.
[30] M.K. Chris Paul, X. Wang, and A. McCallum, “Multi-Conditional Learning for Joint Probability Models with Latent Variables,” Proc. NIPS Workshop Advances Structured Learning Text and Speech Processing, 2006.
[31] A. McCallum, C. Pal, G. Druck, and X. Wang, “Multi-Conditional Learning: Generative/Discriminative Training for Clustering and Classification,” Proc. 21st AAAI Conf. Artificial Intelligence, 2006.
[32] M. Pollefeys, “Visual Modeling with a Hand-Held Camera,” Int'l J. Computer Vision, vol. 59, 2004.
[33] N. Snavely, S.M. Seitz, and R. Szeliski, “Photo Tourism: Exploring Photo Collections in 3D,” Proc. ACM SIGGRAPH '06, pp. 835-846, 2006.
[34] H. Bay, T. Tuytelaars, and L.V. Gool, “Surf: Speeded Up Robust Features,” Proc. Ninth European Conf. Computer Vision, 2006.
[35] M. Lourakis and A. Argyros, “A Generic Sparse Bundle Adjustment C/C++ Package Based on the Levenberg-Marquardt Algorithm,” technical report, Foundation for Research and Tech nology, 2006.
[36] E. Sudderth, A. Torralba, W.T. Freeman, and A.S. Willsky, “Depth from Familiar Objects: A Hierarchical Model for 3D Scenes,” Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, 2006.
[37] N. Dalai and B. Triggs, “Histogram of Oriented Gradients for Human Detection,” Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, 2005.
[38] A. Torralba, “Contextual Priming for Object Detection,” Int'l J. Computer Vision, vol. 53, no. 2, pp. 161-191, 2003.
[39] A. Saxena, J. Driemeyer, J. Kearns, and A. Ng, “Robotic Grasping of Novel Objects,” Proc. 20th Ann. Conf. Neural Information Processing Systems, vol. 19, 2006.
[40] M. Kawakita, K. Iizuka, T. Aida, T. Kurita, and H. Kikuchi, “Real-Time Three-Dimensional Video Image Composition by Depth Information,” IEICE Electronics Express, 2004.
[41] R. Ewerth, M. Schwalb, and B. Freisleben, “Using Depth Features to Retrieve Monocular Video Shots,” Proc. ACM Int'l Conf. Image and Video Retrieval, 2007.
[42] C.M. Bishop, Pattern Recognition and Machine Learning. Springer, 2006.
[43] S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge Univ. Press, 2004.
13 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool