Computer Vision, IEEE International Conference on (2007)
Rio de Janeiro, Brazil
Oct. 14, 2007 to Oct. 21, 2007
Ashutosh Saxena , Computer Science Department, Stanford University, Stanford, CA 94305. firstname.lastname@example.org
Min Sun , Computer Science Department, Stanford University, Stanford, CA 94305. email@example.com
Andrew Y. Ng , Computer Science Department, Stanford University, Stanford, CA 94305. firstname.lastname@example.org
We consider the task of creating a 3-d model of a large novel environment, given only a small number of images of the scene. This is a difficult problem, because if the images are taken from very different viewpoints or if they contain similar-looking structures, then most geometric reconstruction methods will have great difficulty finding good correspondences. Further, the reconstructions given by most algorithms include only points in 3-d that were observed in two or more images; a point observed only in a single image would not be reconstructed. In this paper, we show how monocular image cues can be combined with triangulation cues to build a photo-realistic model of a scene given only a few images¿even ones taken from very different viewpoints or with little overlap. Our approach begins by over-segmenting each image into small patches (superpixels). It then simultaneously tries to infer the 3-d position and orientation of every superpixel in every image. This is done using a Markov Random Field (MRF) which simultaneously reasons about monocular cues and about the relations between multiple image patches, both within the same image and across different images (triangulation cues). MAP inference in our model is efficiently approximated using a series of linear programs, and our algorithm scales well to a large number of images.
A. Saxena, M. Sun and A. Y. Ng, "3-D Reconstruction from Sparse Views using Monocular Vision," 2007 11th IEEE International Conference on Computer Vision(ICCV), Rio de Janeiro, 2007, pp. 1-8.