This Article 
 Bibliographic References 
 Add to: 
Creating Speech-Synchronized Animation
May/June 2005 (vol. 11 no. 3)
pp. 341-352
Richard E. Parent, IEEE Computer Society
We present a facial model designed primarily to support animated speech. Our facial model takes facial geometry as input and transforms it into a parametric deformable model. The facial model uses a muscle-based parameterization, allowing for easier integration between speech synchrony and facial expressions. Our facial model has a highly deformable lip model that is grafted onto the input facial geometry to provide the necessary geometric complexity needed for creating lip shapes and high-quality renderings. Our facial model also includes a highly deformable tongue model that can represent the shapes the tongue undergoes during speech. We add teeth, gums, and upper palate geometry to complete the inner mouth. To decrease the processing time, we hierarchically deform the facial surface. We also present a method to animate the facial model over time to create animated speech using a model of coarticulation that blends visemes together using dominance functions. We treat visemes as a dynamic shaping of the vocal tract by describing visemes as curves instead of keyframes. We show the utility of the techniques described in this paper by implementing them in a text-to-audiovisual-speech system that creates animation of speech from unrestricted text. The facial and coarticulation models must first be interactively initialized. The system then automatically creates accurate real-time animated speech from the input text. It is capable of cheaply producing tremendous amounts of animated speech with very low resource requirements.

[1] H. McGurk and J. MacDonald, “Hearing Lips and Seeing Voices,” Nature, vol. 264, pp. 746-748, 1976.
[2] B. deGraf, “'Performance' Facial Animation,” SIGGRAPH '89 Course Notes 22: State of the Art in Facial Animation, pp. 8-17, July 1989.
[3] L. Williams, “Performance-Driven Facial Animation,” Computer Graphics (SIGGRAPH '90 Proc.), vol. 24, pp. 235-242, Aug. 1990.
[4] D.J. Sturman, “Computer Puppetry,” IEEE Computer Graphics and Applications, vol. 18, no. 1, pp. 38-45, Jan./Feb. 1998.
[5] E.C. Patterson, P.C. Litwinowicz, and N. Greene, “Facial Animation by Spatial Mapping,” Proc. Computer Animation '91, N. Magnenat-Thalmann and D. Thalmann, eds., 1991.
[6] J.A. Provine and L.T. Bruton, “Lip Synchronization in 3-D Model Based Coding for Video-Conferencing,” Proc. IEEE Int'l Symp. Circuits and Systems, pp. 453-456, May 1995.
[7] C. Bregler, M. Covell, and M. Slaney, “Video Rewrite: Driving Visual Speech with Audio,” Proc. SIGGRAPH '97, pp. 353-360, Aug. 1997.
[8] T. Ezzat and T. Poggio, “Miketalk: A Talking Facial Display Based on Morphing Visemes,” Proc. Computer Animation '98, pp. 96-102, June 1998.
[9] M. Brand, “Voice Puppetry,” Proc. SIGGRAPH '99, pp. 21-28, Aug. 1999.
[10] T. Ezzat, G. Geiger, and T. Poggio, “Trainable Videorealistic Speech Animation,” ACM Trans. Graphics (Proc. SIGGRAPH 2002), vol. 21, no. 3, pp. 388-398, July 2002.
[11] F.I. Parke, “A Parametric Model for Human Faces,” PhD dissertation, Univ. of Utah, Salt Lake City, Dec. 1974.
[12] P. Bergeron and P. Lachapelle, “Controlling Facial Expressions and Body Movements in the Computer-Generated Animated Short 'Tony De Peltrie',” SIGGRAPH '85 Advanced Computer Animation Seminar Notes, pp. 1-19, July 1985.
[13] A. Pearce, B.M. Wyvill, G. Wyvill, and D. Hill, “Speech and Expression: A Computer Solution to Face Animation,” Proc. Graphics Interface '86, M. Green, ed., pp. 136-140, May 1986.
[14] J.P. Lewis and F.I. Parke, “Automated Lip-Synch and Speech Synthesis for Character Animation,” Proc. Human Factors in Computing Systems and Graphics Interface '87, J.M. Carroll and P.P. Tanner, eds., pp. 143-147, Apr. 1987.
[15] D.R. Hill, A. Pearce, and B.M. Wyvill, “Animating Speech: An Automated Approach Using Speech Synthesised by Rules,” The Visual Computer, vol. 3, no. 5, pp. 277-289, Mar. 1988.
[16] M. Nahas, H. Huitric, and M. Saintourens, “Animation of a B-Spline Figure,” The Visual Computer, vol. 3, no. 5, pp. 272-276, Mar. 1988.
[17] M. Nahas, H. Huitric, M. Rious, and J. Domey, “Registered 3D-Texture Imaging,” Proc. Computer Animation '90 (Second Workshop Computer Animation), N. Magnenat-Thalmann and D. Thalmann, eds., pp. 81-91, Apr. 1990.
[18] C. Pelachaud, N.I. Badler, and M. Steedman, “Linguistic Issues in Facial Animation,” Proc. Computer Animation '91, N. Magnenat-Thalmann and D. Thalmann, eds., pp. 15-30, 1991.
[19] M. Cohen and D. Massaro, “Modeling Coarticulation in Synthetic Visual Speech,” Models and Techniques in Computer Animation, N. Magnenat-Thalmann and D. Thalmann, eds., pp. 139-156, Tokyo: Springer-Verlag, 1993.
[20] K. Waters and T.M. Levergood, “DECface: An Automatic Lip-Synchronization Algorithm for Synthetic Faces,” Technical Report CRL 93/4, Digital Equipment Corp. Cambridge Research Lab, Sept. 1993.
[21] Y. Lee, D. Terzopoulos, and K. Waters, “Realistic Modeling for Facial Animation,” Proc. SIGGRAPH '95, pp. 55-62, Aug. 1995.
[22] B. Le Goff, “Automatic Modeling of Coarticulation in Text-to-Visual Speech Synthesis,” Proc. Eurospeech '97, vol. 3, pp. 1667-1670, Sept. 1997.
[23] M.M. Cohen, J. Beskow, and D.W. Massaro, “Recent Developments in Facial Animation: An Inside View,” Proc. Auditory Visual Speech Perception '98, pp. 201-206, Dec. 1998.
[24] I. Albrecht, J. Haber, and H.-P. Seidel, “Speech Synchronization for Physics-Based Facial Animation,” Proc. Int'l Conf. Computer Graphics, Visualization, and Computer Vision (WSCG 2002), pp. 9-16, Feb. 2002.
[25] J. Haber, K. Kähler, I. Albrecht, H. Yamauchi, and H.-P. Seidel, “Face to Face: From Real Humans to Realistic Facial Animation,” Proc. Israel-Korea Binat'l Conf. Geometrical Modeling and Computer Graphics, pp. 73-82, Oct. 2001.
[26] K. Kähler, J. Haber, H. Yamauchi, and H.-P. Seidel, “Head Shop: Generating Animated Head Models with Anatomical Structure,” Proc. ACM SIGGRAPH Symp. Computer Animation (SCA), pp. 55-64, July 2002.
[27] T. Guiard-Marigny, A. Adjoudani, and C. Benoit, “A 3-D Model of the Lips for Visual Speech Synthesis,” Proc. Second ESCA/IEEE Workshop, pp. 49-52, Sept. 1994.
[28] T. Guiard-Marigny, N. Tsingos, A. Adjoudani, C. Benoit, and M.-P. Gascuel, “3D Models of the Lips for Realistic Speech Animation,” Proc. Computer Animation '96, pp. 80-89, 1996.
[29] S. Basu and A. Pentland, “A Three-Dimensional Model of Human Lip Motions Trained from Video,” Proc. 1997 IEEE Workshop Non-Rigid and Articulated Objects (NAM '97), pp. 46-53, June 1997.
[30] S.A. King, R.E. Parent, and B. Olsafsky, “A Muscle-Based 3D Parametric Lip Model for Speech,” Deformable Avatars, N. Magnenat-Thalmann and D. Thalmann, eds., pp. 12-23, Boston/Deordrecht/London: Kluwer Academic, 2001.
[31] J. Kleiser, “A Fast, Efficient, Accurate Way to Represent the Human Face,” SIGGRAPH '89 Course Notes 22: State of the Art in Facial Animation, pp. 36-40, July 1989.
[32] W.T. Reeves, “Simple and Complex Facial Animation: Case Studies,” SIGGRAPH '90 Course Notes 26: State of the Art in Facial Animation, pp. 88-106, Aug. 1990.
[33] M. Stone, “Toward a Model of Three-Dimensional Tongue Movement,” J. Phonetics, vol. 19, pp. 309-320, 1991.
[34] M. Stone and A. Lundberg, “Three-Dimensional Tongue Surface Shapes of English Consonants and Vowels,” J. Acoustical Soc. Am., vol. 99, no. 6, pp. 3728-3737, June 1996.
[35] S. Maeda, “Compensatory Articulation during Speech: Evidence from the Analysis and Synthesis of Vocal-Tract Shapes Using an Articulatory Model,” Speech Production and Speech Modelling, W.J. Hardcastle and A. Marchal, eds., pp. 131-149, Dordrecht: Kluwer Academic, 1990.
[36] R. Wilhelms-Tricarico, “Physiological Modeling of Speech Production: Methods for Modeling Soft-Tissue Articulators,” J. Acoustical Soc. Am., vol. 97, no. 5, pp. 3085-3098, May 1995.
[37] R.F. Wilhelms-Tricarico and J.S. Perkell, “Biomechanical and Physiological Based Speech Modeling,” Progress in Speech Synthesis, J.P.H. Von Santen et al., eds., pp. 221-233, Springer, 1997.
[38] C. Pelachaud, C. van Overveld, and C. Seah, “Modeling and Animating the Human Tongue during Speech Production,” Proc. Computer Animation '94, pp. 40-49, May 1994.
[39] S.A. King and R.E. Parent, “A 3D Parametric Tongue Model for Animated Speech,” JVCA, vol. 12, no. 3, pp. 107-115, 2001.
[40] R.D. Kent and F.D. Minifie, “Coarticulation in Recent Speech Production Models,” J. Phonetics, vol. 5, pp. 115-135, 1977.
[41] A. Löfqvist, “Speech as Audible Gestures,” Speech Production and Speech Modeling, W.J. Hardcastle and A. Marchal, eds., pp. 289-322, Dordrect: Kluwer Academic, 1990
[42] I. Albrecht, J. Haber, and H.-P. Seidel, “Automatic Generation of Non-Verbal Facial Expressions from Speech,” Proc. Computer Graphics Int'l (CGI) 2002, pp. 283-293, July 2002.
[43] I. Albrecht, J. Haber, M. Schröder, and H.-P. Seidel, “'May I Talk to You? :-)'— Facial Animation from Text,” Proc. Pacific Graphics 2002, pp. 77-86, Oct. 2002.
[44] J. Cassell, C. Pelachaud, N. Badler, M. Steedman, B. Achorn, T. Bechet, B. Douville, S. Prevost, and M. Stone, “Animated Conversation: Rule-Based Generation of Facial Expression Gesture and Spoken Intonation for Multiple Converstaional Agents,” Proc. SIGGRAPH '94, pp. 413-420, July 1994.
[45] J. Cassell, H.H. Vilhjálmsson, and T. Bickmore, “BEAT: The Behavior Expression Animation Toolkit,” Proc. SIGGRAPH '01, E. Fiume, ed., pp. 477-486, Aug. 2001.
[46] F.I. Parke and K. Waters, Computer Facial Animation. Wellesley, Mass.: A.K. Peters, 1996.
[47] J.R. Shewchuk, “Triangle: A Two-Dimensional Quality Mesh Generator,” , 1996, accessed Sept. 1999.
[48] D. Dew and P.J. Jensen, Phonetic Processing: The Dynamics of Speech. Columbus, Ohio: Charles E. Merrill Publishing Company, 1977.
[49] A.W. Black, P. Taylor, R. Caley, and R. Clark, “The Festival Speech Synthesis System,” /, Aug. 1999.
[50] O. Fujimura and J. Lovins, “Syllables as Concatenative Phonetic Units,” Syllables and Segments, A. Bell and J.B. Hooper, eds., pp. 107-120, Amsterdam: North Holland, 1978.
[51] O. Fujimura, “Phonology and Phonetics— A Syllable-Based Model of Articulatory Organization,” J. Acoustic Soc. Japan (E), vol. 13, pp. 39-48, 1992.
[52] R. Sproat and O. Fujimura, “Allophonic Variation in English /l/ and Its Implications for Phonetic Implementation,” J. Phonetics, vol. 21, pp. 291-311, 1993.
[53] O. Fujimura, “The C/D Model and Prosodic Control of Articulatory Behavior,” Phonetica, vol. 57, pp. 128-138, 2000.
[54] “Flinger: Festival Singer,”, Dec. 2001.
[55] “The MBROLA Project,” /, 1999.
[56] Cyberware, Inc., http:/, Oct. 2000.

Index Terms:
Facial animation, speech synchronization, lip synchronization, animation, visual speech synthesis, coarticulation, facial modeling.
Scott A. King, Richard E. Parent, "Creating Speech-Synchronized Animation," IEEE Transactions on Visualization and Computer Graphics, vol. 11, no. 3, pp. 341-352, May-June 2005, doi:10.1109/TVCG.2005.43
Usage of this product signifies your acceptance of the Terms of Use.