The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.03 - March (2011 vol.33)
pp: 603-617
Guofeng Zhang , Zhejiang University, Hangzhou
Jiaya Jia , The Chinese University of Hong Kong, Hong Kong
Wei Hua , Zhejiang University, Hangzhou
Hujun Bao , Zhejiang University, Hangzhou
ABSTRACT
Extracting high-quality dynamic foreground layers from a video sequence is a challenging problem due to the coupling of color, motion, and occlusion. Many approaches assume that the background scene is static or undergoes the planar perspective transformation. In this paper, we relax these restrictions and present a comprehensive system for accurately computing object motion, layer, and depth information. A novel algorithm that combines different clues to extract the foreground layer is proposed, where a voting-like scheme robust to outliers is employed in optimization. The system is capable of handling difficult examples in which the background is nonplanar and the camera freely moves during video capturing. Our work finds several applications, such as high-quality view interpolation and video editing.
INDEX TERMS
Bilayer segmentation, depth recovery, motion estimation, video editing.
CITATION
Guofeng Zhang, Jiaya Jia, Wei Hua, Hujun Bao, "Robust Bilayer Segmentation and Motion/Depth Estimation with a Handheld Camera", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.33, no. 3, pp. 603-617, March 2011, doi:10.1109/TPAMI.2010.115
REFERENCES
[1] S. Ayer and H.S. Sawhney, "Layered Representation of Motion Video Using Robust Maximum-Likelihood Estimation of Mixture Models and Mdl Encoding," Proc. IEEE Int'l Conf. Computer Vision, pp. 777-784, 1995.
[2] X. Bai, J. Wang, D. Simons, and G. Saprio, "Video Snapcut: Robust Video Object Cutout Using Localized Classifiers," ACM Trans. Graphics, vol. 28, no. 3, 2009.
[3] S. Baker, D. Scharstein, J.P. Lewis, S. Roth, M.J. Black, and R. Szeliski, "A Database and Evaluation Methodology for Optical Flow," Proc. IEEE Int'l Conf. Computer Vision, pp. 1-8, 2007.
[4] P. Bhat, C.L. Zitnick, N. Snavely, A. Agarwala, M. Agrawala, B. Curless, M. Cohen, and S.B. Kang, "Using Photographs to Enhance Videos of a Static Scene," Rendering Techniques 2007, J. Kautz and S. Pattanaik, eds., pp. 327-338, A.K. Peters, June 2007.
[5] M.J. Black and P. Anandan, "The Robust Estimation of Multiple Motions: Parametric and Piecewise-Smooth Flow Fields," Computer Vision and Image Understanding, vol. 63, no. 1, pp. 75-104, 1996.
[6] Y. Boykov, O. Veksler, and R. Zabih, "Fast Approximate Energy Minimization via Graph Cuts," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23, no. 11, pp. 1222-1239, Nov. 2001.
[7] M. Brown and D.G. Lowe, "Recognising Panoramas," Proc. IEEE Int'l Conf. Computer Vision, pp. 1218-1227, 2003.
[8] T. Brox, C. Bregler, and J. Malik, "Large Displacement Optical Flow," Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, 2009.
[9] T. Brox, A. Bruhn, N. Papenberg, and J. Weickert, "High Accuracy Optical Flow Estimation Based on a Theory for Warping," Proc. European Conf. Computer Vision, vol. 4, pp. 25-36, 2004.
[10] A. Bruhn and J. Weickert, "Towards Ultimate Motion Estimation: Combining Highest Accuracy with Real-Time Performance," Proc. IEEE Int'l Conf. Computer Vision, pp. 749-755, 2005.
[11] Y.-Y. Chuang, A. Agarwala, B. Curless, D. Salesin, and R. Szeliski, "Video Matting of Complex Scenes," ACM Trans. Graphics, vol. 21, no. 3, pp. 243-248, 2002.
[12] Y.-Y. Chuang, B. Curless, D. Salesin, and R. Szeliski, "A Bayesian Approach to Digital Matting," Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, vol. 2, pp. 264-271, 2001.
[13] D. Comaniciu and P. Meer, "Mean Shift: A Robust Approach Toward Feature Space Analysis," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 5, pp. 603-619, May 2002.
[14] A. Criminisi, G. Cross, A. Blake, and V. Kolmogorov, "Bilayer Segmentation of Live Video," Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, vol. 1, pp. 53-60, 2006.
[15] Z. Dong, L. Jiang, G. Zhang, Q. Wang, and H. Bao, "Live Video Montage with a Rotating Camera," Computer Graphics Forum, vol. 28, no. 7, pp. 1745-1753, 2009.
[16] A.M. Elgammal, D. Harwood, and L.S. Davis, "Non-Parametric Model for Background Subtraction," Proc. European Conf. Computer Vision, vol. 2, pp. 751-767, 2000.
[17] R.I. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision, second ed. Cambridge Univ. Press, 2004.
[18] B.K.P. Horn and B.G. Schunck, "Determining Optical Flow," Artificial Intelligence, vol. 17, nos. 1-3, pp. 185-203, 1981.
[19] S. Khan and M. Shah, "Object Based Segmentation of Video Using Color, Motion and Spatial Information," Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, vol. 2, pp. 746-751, 2001.
[20] V. Kolmogorov, A. Criminisi, A. Blake, G. Cross, and C. Rother, "Bi-Layer Segmentation of Binocular Stereo Video," Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, vol. 2, pp. 407-414, 2005.
[21] M.P. Kumar, P.H.S. Torr, and A. Zisserman, "Learning Layered Motion Segmentation of Video," Proc. IEEE Int'l Conf. Computer Vision, pp. 33-40, 2005.
[22] V.S. Lempitsky, S. Roth, and C. Rother, "Fusionflow: Discrete-Continuous Optimization for Optical Flow Estimation," Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, 2008.
[23] A. Levin, D. Lischinski, and Y. Weiss, "A Closed-Form Solution to Natural Image Matting," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 30, no. 2, pp. 228-242, Feb. 2008.
[24] Y. Li, J. Sun, and H.-Y. Shum, "Video Object Cut and Paste," ACM Trans. Graphics, vol. 24, no. 3, pp. 595-600, 2005.
[25] Y. Li, J. Sun, C.-K. Tang, and H.-Y. Shum, "Lazy Snapping," ACM Trans. Graphics, vol. 23, no. 3, pp. 303-308, 2004.
[26] C. Liu, W.T. Freeman, E.H. Adelson, and Y. Weiss, "Human-Assisted Motion Annotation," Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, 2008.
[27] F. Liu and M. Gleicher, "Learning Color and Locality Cues for Moving Object Detection and Segmentation," Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, 2009.
[28] D.G. Lowe, "Distinctive Image Features from Scale-Invariant Keypoints," Int'l J. Computer Vision, vol. 60, no. 2, pp. 91-110, 2004.
[29] B.D. Lucas and T. Kanade, "An Iterative Image Registration Technique with an Application to Stereo Vision," Proc. Int'l Joint Conf. Artificial Intelligence, pp. 674-679, 1981.
[30] W.R. Mark, L. McMillan, and G. Bishop, "Post-Rendering 3D Warping," Proc. Symp. Interactive 3D Grahics, vol. 180, pp. 7-16, 1997.
[31] M.M. Oliveira, B. Bowen, R. McKenna, and Y.-S. Chang, "Fast Digital Image Inpainting," Proc. Int'l Conf. Visualization Imaging and Image Processing, pp. 261-266, 2001.
[32] C. Rother, V. Kolmogorov, and A. Blake, "'Grabcut': Interactive Foreground Extraction Using Iterated Graph Cuts," ACM Trans. Graphics, vol. 23, no. 3, pp. 309-314, 2004.
[33] Y. Sheikh and M. Shah, "Bayesian Object Detection in Dynamic Scenes," Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, vol. 1, pp. 74-79, 2005.
[34] J. Sun, Y. Li, and S.B. Kang, "Symmetric Stereo Matching for Occlusion Handling," Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, vol. 2, pp. 399-406, 2005.
[35] J. Sun, W. Zhang, X. Tang, and H.-Y. Shum, "Background Cut," Proc. European Conf. Computer Vision, vol. 2, pp. 628-641, 2006.
[36] R. Szeliski and H.-Y. Shum, "Creating Full View Panoramic Image Mosaics and Environment Maps," Proc. ACM SIGGRAPH, pp. 251-258, 1997.
[37] J. Wang, P. Bhat, A. Colburn, M. Agrawala, and M.F. Cohen, "Interactive Video Cutout," ACM Trans. Graphics, vol. 24, no. 3, pp. 585-594, 2005.
[38] J. Wang and M.F. Cohen, "An Iterative Optimization Approach for Unified Image Segmentation and Matting," Proc. IEEE Int'l Conf. Computer Vision, pp. 936-943, 2005.
[39] J. Wang and M.F. Cohen, "Optimized Color Sampling for Robust Matting," Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, 2007.
[40] A. Wedel, D. Cremers, T. Pock, and H. Bischof, "Structure- and Motion-Adaptive Regularization for High Accuracy Optic Flow," Proc. IEEE Int'l Conf. Computer Vision, 2009.
[41] Y. Weiss and E.H. Adelson, "A Unified Mixture Framework for Motion Segmentation: Incorporating Spatial Coherence and Estimating the Number of Models," Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, pp. 321-326, 1996.
[42] L. Xu, J. Chen, and J. Jia, "A Segmentation Based Variational Model for Accurate Optical Flow Estimation," Proc. European Conf. Computer Vision, vol. 1, pp. 671-684, 2008.
[43] P. Yin, A. Criminisi, J. Winn, and I. Essa, "Tree-Based Classifiers for Bilayer Video Segmentation," Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, 2007.
[44] G. Zhang, Z. Dong, J. Jia, L. Wan, T.-T. Wong, and H. Bao, "Refilming with Depth-Inferred Videos," IEEE Trans. Visualization and Computer Graphics, vol. 15, no. 5, pp. 828-840, Sept./Oct. 2009.
[45] G. Zhang, J. Jia, T.-T. Wong, and H. Bao, "Consistent Depth Maps Recovery from a Video Sequence," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 31, no. 6, pp. 974-988, June 2009.
[46] G. Zhang, J. Jia, W. Xiong, T.-T. Wong, P.-A. Heng, and H. Bao, "Moving Object Extraction with a Hand-Held Camera," Proc. IEEE Int'l Conf. Computer Vision, 2007.
[47] G. Zhang, X. Qin, W. Hua, T.-T. Wong, P.-A. Heng, and H. Bao, "Robust Metric Reconstruction from Challenging Video Sequences," Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, 2007.
28 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool