This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Video Stereolization: Combining Motion Analysis with User Interaction
July 2012 (vol. 18 no. 7)
pp. 1079-1088
Jizhou Gao, Center for Visualization & Virtual Environments, Univ. of Kentucky, Lexington, KY, USA
Miao Liao, Center for Visualization & Virtual Environments, Univ. of Kentucky, Lexington, KY, USA
Ruigang Yang, Center for Visualization & Virtual Environments, Univ. of Kentucky, Lexington, KY, USA
Minglun Gong, Dept. of Comput. Sci., Memorial Univ. of Newfoundland, St. John's, NL, Canada
We present a semiautomatic system that converts conventional videos into stereoscopic videos by combining motion analysis with user interaction, aiming to transfer as much as possible labeling work from the user to the computer. In addition to the widely used structure from motion (SFM) techniques, we develop two new methods that analyze the optical flow to provide additional qualitative depth constraints. They remove the camera movement restriction imposed by SFM so that general motions can be used in scene depth estimation-the central problem in mono-to-stereo conversion. With these algorithms, the user's labeling task is significantly simplified. We further developed a quadratic programming approach to incorporate both quantitative depth and qualitative depth (such as these from user scribbling) to recover dense depth maps for all frames, from which stereoscopic view can be synthesized. In addition to visual results, we present user study results showing that our approach is more intuitive and less labor intensive, while producing 3D effect comparable to that from current state-of-the-art interactive algorithms.

[1] C. Tomasi, “Shape and Motion from Image Streams under Orthography: A Factorization Method,” Int'l J. Computer Vision, vol. 9, pp. 137-154, 1992.
[2] G. Zhang, J. Jia, T.-T. Wong, and H. Bao, “Consistent Depth Maps Recovery from a Video Sequence,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 31, no. 6, pp. 974-988, June 2009.
[3] S. Diplaris, N. Grammalidis, D. Tzovaras, and M.G. Strintzis, “Generation of Stereoscopic Image Sequences Using Structure and Rigid Motion Estimation by Extended Kalman Filters,” Proc. IEEE Int'l Conf. Multimedia and Expo (ICME), vol. 2, 2002.
[4] K. Moustakas, D. Tzovaras, and M.G. Strintzis, “Stereoscopic Video Generation Based on Efficient Layered Structure and Motion Estimation from a Monoscopic Image Sequence,” IEEE Trans. Circuits and Systems for Video Technology, vol. 15, no. 8, pp. 1065-1073, Aug. 2005.
[5] B.K.P. Horn and B.G. Schunck, “Determining Optical Flow,” Artifical Intelligence, vol. 17, pp. 185-203, 1981.
[6] M.J. Black and P. Anandan, “The Robust Estimation of Multiple Motions: Parametric and Piecewise-Smooth Flow Fields,” Computer Vision and Image Understanding, vol. 63, pp. 75-104, http://portal.acm.orgcitation.cfm?id=229144.229157 , Jan. 1996.
[7] C. Lei and Y.H. Yang, “Optical Flow Estimation on Coarse-to-Fine Region-Trees Using Discrete Optimization,” Proc. IEEE Int'l Conf. Computer Vision (ICCV), 2009.
[8] C. Varekamp and B. Barenbrug, “Improved Depth Propagation for 2D to 3D Video Conversion Using Key-Frames,” Proc. European Conf. Visual Media Production (IETCVMP), 2007.
[9] M. Guttmann, L. Wolf, and D. Cohen-or, “Semi-Automatic Stereo Extraction from Video Footage,” Proc. IEEE Int'l Conf. Computer Vision (ICCV), 2009.
[10] D. Hoiem, A.A. Efros, and M. Hebert, “Geometric Context from a Single Image,” Proc. IEEE Int'l Conf. Computer Vision, 2005.
[11] A. Saxena, M. Sun, and A.Y. Ng, “Learning 3-D Scene Structure from a Single Still Image,” Proc. IEEE Int'l Conf. Computer Vision, 2007.
[12] A. Saxena, S.H. Chung, and A.Y. Ng, “Learning Depth from Single Monocular Images,” Proc. Neural Information Processing Systems (NIPS), 2005.
[13] G. Zhang, Z. Dong, J. Jia, L. Wan, T.-T. Wong, and H. Bao, “Refilming with Depth-Inferred Videos,” IEEE Trans. Visualization and Computer Graphics, vol. 15, no. 5, pp. 828-840, Sept. 2009.
[14] S. Knorr and T. Sikora, “An Image-Based Rendering (IBR) Approach for Realistic Stereo View Synthesis of TV Broadcast Based on Structure from Motion,” Proc. Int'l Conf. Image Processing (ICIP), 2007.
[15] E. Rotem, K. Wolowelsky, and D. Pelz, “Automatic Video to Stereoscopic Video Conversion,” Proc. SPIE, vol. 5664, pp. 198-206, 2005.
[16] G. Zhang, W. Hua, X. Qin, T.-T. Wong, and H. Bao, “Stereoscopic Video Synthesis from a Monocular Video,” IEEE Trans. Visualization and Computer Graphics, vol. 13, no. 4, pp. 686-696, July/Aug. 2007.
[17] S. Katz, Film Directing Shot by Shot: Visualizing from Concept to Screen. Michael Wiese, 1991.
[18] P. Harman, “Home Based 3D Entertainment: An Overview,” Proc. Int'l Conf. Image Processing (ICIP), vol. 1, pp. 1-4, 2000.
[19] P. Harman, J. Flack, S. Fox, and M. Dowley, “Rapid 2D to 3D Conversion,” Proc. Stereoscopic Displays and Virtual Reality Systems IX, pp. 78-86, 2002.
[20] A.L. Dani, D. Lischinski, and Y. Weiss, “Colorization Using Optimization,” ACM Trans. Graphics, vol. 23, pp. 689-694, 2004.
[21] D. Sykora, D. Sedlacek, S. Jinchao, J. Dingliana, and S. Collins, “Adding Depth to Cartoons Using Sparse Depth (In)equalities,” Proc. Eurographics, 2010.
[22] Y. Boykov and M. Jolly, “Interactive Graph Cuts for Optimal Boundary and Region Segmentation of Objects in ND Images,” Proc. IEEE Int'l Conf. Computer Vision (ICCV), 2001.
[23] Y. Li, J. Sun, C. Tang, and H. Shum, “Lazy Snapping,” Proc. ACM SIGGRAPH, 2004.
[24] J. Kopf, M. Cohen, D. Lischinski, and M. Uyttendaele, “Joint Bilateral Upsampling,” ACM Trans. Graphics, vol. 26, no. 3, p. to appear, 2007.

Index Terms:
video signal processing,image motion analysis,image sensors,image sequences,interactive systems,quadratic programming,stereo image processing,user interfaces,interactive algorithms,video stereolization,motion analysis,user interaction,semiautomatic system,stereoscopic videos,structure-from-motion techniques,SFM techniques,optical flow analysis,qualitative depth constraints,camera movement restriction,scene depth estimation,mono-to-stereo conversion,user labeling task,quadratic programming,quantitative depth,user scribbling,3D effect,Three dimensional displays,Cameras,Pixel,Labeling,Quadratic programming,Image segmentation,Image sequences,user labeling.,Semiautomatic 2D-3D conversion,stereo/3D video/movie,motion analysis
Citation:
Jizhou Gao, Miao Liao, Ruigang Yang, Minglun Gong, "Video Stereolization: Combining Motion Analysis with User Interaction," IEEE Transactions on Visualization and Computer Graphics, vol. 18, no. 7, pp. 1079-1088, July 2012, doi:10.1109/TVCG.2011.114
Usage of this product signifies your acceptance of the Terms of Use.