Computer Graphics International Conference (2005)
Stony Brook, NY, USA
June 2, 2005 to June 4, 2005
Z. Deng , Dept. of Comput. Sci., Univ. of Southern California, Los Angeles, CA, USA
J.P. Lewis , Dept. of Comput. Sci., Univ. of Southern California, Los Angeles, CA, USA
U. Neumann , Dept. of Comput. Sci., Univ. of Southern California, Los Angeles, CA, USA
While speech animation fundamentally consists of a sequence of phonemes over time, sophisticated animation requires smooth interpolation and co-articulation effects, where the preceding and following phonemes influence the shape of a phoneme. Co-articulation has been approached in speech animation research in several ways, most often by simply smoothing the mouth geometry motion over time. Data-driven approaches tend to generate realistic speech animation, but they need to store a large facial motion database, which is not feasible for real time gaming and interactive applications on platforms such as PDAs and cell phones. In this paper we show that accurate speech co-articulation model with compact size can be learned from facial motion capture data. An initial phoneme sequence is generated automatically from text-to-speech (TTS) systems. Then, our learned co-articulation model is applied to the resulting phoneme sequence, producing natural and detailed motion. The contribution of this work is that speech co-articulation models "learned" from real human motion data can be used to generate natural-looking speech motion while simultaneously preserving the expressiveness of the animation via keyframing control. Simultaneously, this approach can be effectively applied to interactive applications due to its compact size.
U. Neumann, Z. Deng and J. Lewis, "Synthesizing speech animation by learning compact speech co-articulation models," Computer Graphics International 2005(CGI), Stony Brook, NY, USA, 2005, pp. 19-25.