This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Probabilistic Fusion of Stereo with Color and Contrast for Bilayer Segmentation
September 2006 (vol. 28 no. 9)
pp. 1480-1492
This paper describes models and algorithms for the real-time segmentation of foreground from background layers in stereo video sequences. Automatic separation of layers from color/contrast or from stereo alone is known to be error-prone. Here, color, contrast, and stereo matching information are fused to infer layers accurately and efficiently. The first algorithm, Layered Dynamic Programming (LDP), solves stereo in an extended six-state space that represents both foreground/background layers and occluded regions. The stereo-match likelihood is then fused with a contrast-sensitive color model that is learned on-the-fly and stereo disparities are obtained by dynamic programming. The second algorithm, Layered Graph Cut (LGC), does not directly solve stereo. Instead, the stereo match likelihood is marginalized over disparities to evaluate foreground and background hypotheses and then fused with a contrast-sensitive color model like the one used in LDP. Segmentation is solved efficiently by ternary graph cut. Both algorithms are evaluated with respect to ground truth data and found to have similar performance, substantially better than either stereo or color/contrast alone. However, their characteristics with respect to computational efficiency are rather different. The algorithms are demonstrated in the application of background substitution and shown to give good quality composite video output.

[1] http://research.microsoft.com/vision/cambridge i2i, 2006.
[2] Y.-Y. Chuang, A. Agarwala, B. Curless, D. Salesin, and R. Szeliski, “Video Matting of Complex Scenes,” Proc. Conf. Computer Graphics and Interactive Techniques, pp. 243-248, 2002.
[3] J. Bergen, P. Burt, R. Hingorani, and S. Peleg, “A Three-Frame Algorithm for Estimating Two-Component Image Motion,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 14, no. 9, pp. 886-896, Sept. 1992.
[4] S. Baker, R. Szeliski, and P. Anandan, “A Layered Approach to Stereo Reconstruction,” Proc. Conf. Computer Vision and Pattern Recognition, pp. 434-441, 1998.
[5] N. Jojic and B. Frey, “Learning Flexible Sprites in Video Layers,” Proc. Conf. Computer Vision and Pattern Recognition, pp. 199-206, 2001.
[6] P.H.S. Torr, R. Szeliski, and P. Anandan, “An Integrated Bayesian Approach to Layer Extraction from Image Sequences,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23, no. 3, pp. 297-303, Mar. 2001.
[7] J.Y.A. Wang and E.H. Adelson, “Layered Representation for Motion Analysis,” Proc. Conf. Computer Vision and Pattern Recognition, pp. 361-366, 1993.
[8] Y. Ohta and T. Kanade, “Stereo by Intra and Interscan Line Search Using Dynamic Programming,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 7, no. 2, pp. 139-154, 1985.
[9] I. Cox, S. Hingorani, and S. Rao, “A Maximum Likelihood Stereo Algorithm,” Computer Vision and Image Understanding, vol. 63, no. 3, pp. 542-567, 1996.
[10] D. Geiger, B. Ladendorf, and A. Yuille, “Occlusions and Binocular Stereo,” Int'l J. Computer Vision, vol. 14, pp. 211-226, 1995.
[11] P. Belhumeur, “A Bayesian Approach to Binocular Stereopsis,” Int'l J. Computer Vision, vol. 19, no. 3, pp. 237-260, 1996.
[12] V. Kolmogorov and R. Zabih, “Multi-Camera Scene Reconstruction via Graph Cuts,” Proc. European Conf. Computer Vision, pp. 82-96, 2002.
[13] A. Criminisi, J. Shotton, A. Blake, and P. Torr, “Gaze Manipulation for One to One Teleconferencing,” Proc. Int'l Conf. Computer Vision, pp. 191-198, 2003.
[14] A. Criminisi, J. Shotton, A. Blake, and P. Torr, “Efficient Dense Stereo with Occlusion by Four-State Dynamic Programming,” Int'l J. Computer Vision, 2006, in press, also available as Microsoft Research Report 2003-59.
[15] Y. Boykov and M.-P. Jolly, “Interactive Graph Cuts for Optimal Boundary and Region Segmentation of Objects in N-D Images,” Proc. Int'l Conf. Computer Vision, pp. 105-112, 2001.
[16] C. Rother, V. Kolmogorov, and A. Blake, “Grabcut: Interactive Foreground Extraction Using Iterated Graph Cuts,” ACM Trans. Graphics, vol. 23, no. 3, pp. 309-314, 2004.
[17] Y. Boykov, O. Veksler, and R. Zabih, “Fast Approximate Energy Minimization via Graph Cuts,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23, no. 11, Nov. 2001.
[18] A. Criminisi and A. Blake, “The SPS Algorithm: Patching Figural Continuity and Transparency by Split-Patch Search,” Proc. Conf. Computer Vision and Pattern Recognition, pp. 721-728, 2004.
[19] J.D. Lafferty, A. McCallum, and F.C.N. Pereira, “Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data,” ICML '01: Proc. Int'l Conf. Machine Learning, pp. 282-289, 2001.
[20] S. Geman and D. Geman, “Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 6, no. 6, pp. 721-741, 1984.
[21] A. Blake and A. Zisserman, Visual Reconstruction. Cambridge, Mass.: MIT Press, 1987, http://www.research.microsoft.com/ablake/ papersVisualReconstruction.
[22] P. Perona and J. Malik, “Scale-Space and Edge Detection Using Anisotropic Diffusion,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 12, no. 7, pp. 629-639, July 1990.
[23] A. Blake, C. Rother, M. Brown, P. Perez, and P. Torr, “Interactive Image Segmentation Using an Adaptive GMMRF Model,” Proc. European Conf. Computer Vision, pp. 428-441, 2004.
[24] D. Scharstein and R. Szeliski, “A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms,” Int'l J. Computer Vision, vol. 47, nos. 1-3, pp. 7-42, 2002.
[25] A. Dempster, M. Laird, and D. Rubin, “Maximum Likelihood from Incomplete Data via the EM Algorithm,” J. Royal Statistical Soc. B., vol. 39, pp. 1-38, 1977.
[26] H. Baker and T. Binford, “Depth from Edge and Intensity Based Stereo,” Proc. Int'l Joint Conf. Artificial Intelligence, pp. 631-636, 1981.
[27] R. Durbin, S. Eddy, A. Krogh, and G. Mitchison, Biological Sequence Analysis. Cambridge Univ. Press, 1998.
[28] S. Rowe and A. Blake, “Statistical Mosaics for Tracking,” J. Image and Vision Computing, vol. 14, pp. 549-564, 1999.
[29] C. Stauffer and W. Grimson, “Adaptive Background Mixture Models for Real-Time Tracking,” Proc. Conf. Computer Vision and Pattern Recognition, pp. 246-252, 1999.
[30] K. Toyama, J. Krumm, B. Brumitt, and B. Meyers, “Wallflower: Principles and Practice of Background Maintenance,” Proc. Int'l Conf. Computer Vision, pp. 255-261, 1999.
[31] A. Fitzgibbon, Y. Wexler, and A. Zisserman, “Image-Based Rendering Using Image-Based Priors,” Proc. Int'l Conf. Computer Vision, pp. 279-290, 2003.
[32] O. Williams, M. Isard, and J. MacCormick, “Estimating Disparity and Occlusions in Stereo Video Sequences,” Proc. Conf. Computer Vision and Pattern Recognition, 2005.
[33] B. Lucas and T. Kanade, “An Iterative Image Registration Technique with an Application to Stereo Vision,” Proc. Int'l Joint Conf. Artificial Intelligence, pp. 674-679, 1981.
[34] S. Birchfield and C. Tomasi, “A Pixel Dissimilarity Measure that Is Insensitive to Image Sampling,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, no. 4, pp. 401-406, Apr. 1998.
[35] http://cat.middlebury.edustereo/, 2006.

Index Terms:
Computer vision, 3D/stereo scene analysis, image processing and computer vision, computer vision, parameter learning, dynamic programming.
Citation:
Vladimir Kolmogorov, Antonio Criminisi, Andrew Blake, Geoffrey Cross, Carsten Rother, "Probabilistic Fusion of Stereo with Color and Contrast for Bilayer Segmentation," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 9, pp. 1480-1492, Sept. 2006, doi:10.1109/TPAMI.2006.193
Usage of this product signifies your acceptance of the Terms of Use.