loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Transferring of Speech Movements from Video to 3D Face Space
January/February 2007 (vol. 13 no. 1)
pp. 58-69

Abstract—We present a novel method for transferring speech animation recorded in low quality videos to high resolution 3D face models. The basic idea is to synthesize the animated faces by an interpolation based on a small set of 3D key face shapes which span a 3D face space. The 3D key shapes are extracted by an unsupervised learning process in 2D video space to form a set of 2D visemes which are then mapped to the 3D face space. The learning process consists of two main phases: 1) Isomap-based nonlinear dimensionality reduction to embed the video speech movements into a low-dimensional manifold and 2) K-means clustering in the low-dimensional space to extract 2D key viseme frames. Our main contribution is that we use the Isomap-based learning method to extract intrinsic geometry of the speech video space and thus to make it possible to define the 3D key viseme shapes. To do so, we need only to capture a limited number of 3D key face models by using a general 3D scanner. Moreover, we also develop a skull movement recovery method based on simple anatomical structures to enhance 3D realism in local mouth movements. Experimental results show that our method can achieve realistic 3D animation effects with a small number of 3D key face models.

[1] 58 I. Albrecht, J. Haber, K. Kahler, M. Schroder, and H.-P. Seidel, “May I Talk to You? Facial Animation from Text,” Proc. 10th Pacific Conf. Computer Graphics and Applications, pp 77-86, 2002.[2] S. King and R. Parent, “Creating Speech-Synchronized Animation,” IEEE Trans. Visualization and Computer Graphics, vol. 11, no. 3, pp. 341-352, May/June 2005.[3] K. Waters and T.M. Levergood, “DECface: An Automatic Lip- Synchronization Algorithm for Synthetic Faces,” Technical Report CRL 93/4, Digital Equipment Corp., Cambridge Research Lab., 1993.[4] M. Frydrych and M. Dobsík, “Toolkit for Animation of Finnish Talking Head,” Proc. ISCA Tutorial and Research Workshop Audio Visual Speech (AVSP '03), pp. 199-204, 2003.[5] S. Shan and W. Gao, “Individual 3D Face Synthesis Based on Orthogonal Photos and Speech-Driven Facial Animation,” Proc. Int'l Conf. Image Processing, vol. 3, pp. 238-241, 2000.[6] C. Bregler, M. Covell, and M. Slaney, “Video Rewrite: Driving Visual Speech with Audio,” Proc. ACM SIGGRAPH '97, pp. 353-360, 1997.[7] Y. Cao, P. Faloutsos, E. Kohler, and F. Pighin, “Real-Time Speech Motion Synthesis from Recorded Motions,” Proc. ACM SIGGRAPH/Eurographics Symp. Computer Animation, pp. 347-355, 2004.[8] Y. Cao, W.C. Tien, P. Faloutsos, and F. Pighin, “Expressive Speech-Driven Facial Animation,” ACM Trans. Graphics, vol. 24, no. 4, pp. 1283-1302, 2005.[9] I. Ezzat, G. Geiger, and T. Poggio, “Trainable Videorealistic Speech Animation,” ACM Trans. Graphics, vol. 21, no. 3, pp. 388-398, 2002.[10] Y.-J. Chang and T. Ezzat, “Transferable Videorealistic Speech Animation,” Proc. ACM SIGGRAPH/Eurographics Symp. Computer Animation (SCA '05), pp. 143-151, 2005.[11] J. Chai, J. Xiao, and J. Hodgins, “Vision-Based Control of 3D Facial Animation,” Proc. ACM SIGGRAPH/ Eurographics Symp. Computer Animation (SCA '03), pp. 193-206, 2003.[12] E.S. Chuang, H. Deshpande, and C. Bregler, “Facial Expression Space Learning,” Proc. 10th Pacific Conf. Computer Graphics and Applications, pp. 68-76, 2002.[13] D. Vlasic, M. Brand, H. Pfister, and J. Popovic, “Face Transfer with Multilinear Models,” ACM Trans. Graphics, vol. 24, no. 3, pp. 426-433, 2005.[14] M. Brand, “Voice Puppetry,” Proc. ACM SIGGRAPH '99, pp. 21-28, 1999.[15] I. Jolliffe, Principal Component Analysis. Springer-Verlag, 1986.[16] B. Allen, B. Curless, and Z. Popovic, “The Space of All Body Shapes: Reconstruction and Parameterization from Range Scans,” Proc. ACM SIGGRAPH '03, pp. 587-594, 2003.[17] H. Pyun, Y. Kim, W. Chae, H.Y. Kang, and S.Y. Shin, “An Example-Based Approach for Facial Expression Cloning,” Proc. Symp. Computer Animation, pp. 167-176, 2003.[18] E.S. Chuang and C. Bregler, Performance Driven Facial Animation Using Blendshape Interpolation, Technical Report CS-TR-2002-02, Computer Science, Stanford Univ., 2002.[19] V. Blanz and T. Vetter, “A Morphable Model for the Synthesis of 3D Faces,” Proc. ACM SIGGRAPH '99, pp. 187-194, 1999.[20] M. Turk and A. Pentland, “Face Recognition Using Eigenfaces,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 586-591, 1991.[21] J.B. Kruskal and M. Wish, Multidimensional Scaling. Sage Publications, 1978.[22] A. Hyvarinen, J. Karhunen, and E. Oja, Independent Component Analysis. John Wiley & Sons, 2001.[23] Y. Cao, P. Faloutsos, and F. Pighin, “Unsupervised Learning for Speech Motion Editing,” Proc. ACM SIGGRAPH/ Eurographics Symp. Computer Animation, pp. 225-231, 2003.[24] J.B. Tenenbaum, V. de Silva, and J.C. Langford, “A Global Geometric Framework for Nonlinear Dimensionality Reduction,” Science, vol. 290, no. 5500, pp. 2319-2323, 2000.[25] S. Roweis and L. Saul, “Nonlinear Dimensionality Reduction by Locally Linear Embedding,” Science, vol. 290, no. 5500, pp. 2323-2326, 2000.[26] Y. Pei and H. Zha, “Transferring Speech Video onto 3D Realistic Human Faces,” Proc. Pacific Graphics Conf. '05, pp. 13-15, 2005.[27] C. de Juan and B. Bodenheimer, “Cartoon Textures,” Proc. ACM SIGGRAPH/Eurographics Symp. Computer Animation, pp. 267-276, 2004.[28] C. Hu, Y. Chang, R. Feris, and M. Turk, “Manifold Based Analysis of Facial Expression,” Proc. Computer Vision and Pattern Recognition Workshop, pp. 81-81, 2004.[29] Y. Wang, X. Huang, C.S. Lee, S. Zhang, Z. Li, D. Samaras, D. Metaxas, A. Elgammal, and P. Huang, “High Resolution Acquisition, Learning and Transfer of Dynamic 3-D Facial Expressions,” Proc. Eurographics '04, pp. 677-686, 2004.[30] T. DeRose, M. Kass, and T. Truong, “Subdivision Surfaces in Character Animation,” Proc. ACM SIGGRAPH '98, pp. 85-94, 1998.[31] J.-Y. Noh and U. Neumann, “Expression Cloning,” Proc. ACM SIGGRAPH '01, pp. 277-288, 2001.[32] J.Y. Noh, D. Fidaleo, and U. Neumann, “Animated Deformations with Radial Basis Functions,” Proc. ACM Virtual Reality and Software Technology Conf. (VRST '00), pp. 166-174, 2000.[33] K. Waters, “A Muscle Model for Animating Three-Dimensional Facial Expression,” Proc. ACM SIGGRAPH '87, pp. 17-24, 1987.[34] Y. Lee, D. Terzopoulos, and K. Waters, “Realistic Modeling for Facial Animations,” Proc. ACM SIGGRAPH '95, pp. 55-62, 1995.[35] F. Pighin, J. Hecker, D. Lischinski, R. Szeliski, and D.H. Salesin, “Synthesizing Realistic Facial Expressions from Photographs,” Proc. ACM SIGGRAPH '98, pp. 75-84, 1998.[36] R.M. Koch, M.H. Gross, F.R. Carls, D.F. von Buren, G. Fankhauser, and Y.I.H. Parish, “Simulating Facial Surgery Using Finite Element Methods,” Proc. ACM SIGGRAPH '96, pp. 421-428, 1996.[37] P. Joshi, W.C. Tien, M. Desbrun, and F. Pighin, “Learning Controls for Blend Shape Based Realistic Facial Animation,” Proc. Symp. Computer Animation, pp. 187-192, 2003.[38] K. Na and M.R. Jung, “Hierarchical Retargeting of Fine Facial Motions,” Computer Graphics Forum, vol. 23, no. 3, pp. 687-695, 2004.[39] P.J. Sloan, C.F. Rose, and M.F. Cohen, “Shape by Example,” Proc. Symp. Interactive 3D Graphics, pp. 135-143, 2001.[40] V. Blanz, C. Basso, T. Poggio, and T. Vetter, “Reanimating Faces in Images and Video,” Computer Graphics Forum, vol. 22, no. 3, pp.641-650, 2003.[41] T.F. Cootes, G.J. Edwards, and C.J. Taylor, “Active Appearance Models,” Proc. Fifth European Conf. Computer Vision, vol. 2, pp. 484-498, 1998.[42] M. Escher, I. Pandzic, and N.M. Thalmann, “Facial Deformations for MPEG-4,” Proc. Computer Animation Conf. '98, pp. 56-62, 1998.[43] AAM-API, http://www.imm.dtu.dk~aam/, 2003.[44] R.W. Sumner and J. Popovic, “Deformation Transfer for Triangle Meshes,” Proc. ACM SIGGRAPH '04, pp. 399-405, 2004.[45] J. Canny, “A Computational Approach to Edge Detection,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 8, no. 6, pp.679-698, Nov. 1986.[46] K. Kahler, J. Haber, H. Yamauchi, and H.-P. Seidel, “Head Shop: Generating Animated Head Models with Anatomical Structure,” Proc. ACM SIGGRAPH/ Eurographics Symp. Computer Animation (SCA '02), pp. 55-64, 2002.[47] H. Hatze, “High-Precision Three-Dimensional Photo-Grammetric Calibration and Object Space Reconstruction Using a Modified DLT-Approach,” J. Biomechanics, vol. 21, pp. 533-538, 1988.[48] M.M. Cohen and D.W. Massaro, “Modeling Coarticulation in Synthetic Visual Speech,” Models and Techniques in Computer Animation, N.M. Thalmann and D. Thalmann, eds., pp. 139-156. Springer-Verlag, 1993.

Index Terms:
Facial animation, speech synchronization, visual speech synthesis, performance-driven animation, machine learning.
Citation:
Yuru Pei, Hongbin Zha, "Transferring of Speech Movements from Video to 3D Face Space," IEEE Transactions on Visualization and Computer Graphics, vol. 13, no. 1, pp. 58-69, Jan./Feb. 2007, doi:10.1109/TVCG.2007.22
Usage of this product signifies your acceptance of the Terms of Use.