Issue No.11 - Nov. (2012 vol.18)
B. H. Le , Dept. of Comput. Sci., Univ. of Houston, Houston, TX, USA
Xiaohan Ma , Dept. of Comput. Sci., Univ. of Houston, Houston, TX, USA
Zhigang Deng , Dept. of Comput. Sci., Univ. of Houston, Houston, TX, USA
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TVCG.2012.74
This paper describes a fully automated framework to generate realistic head motion, eye gaze, and eyelid motion simultaneously based on live (or recorded) speech input. Its central idea is to learn separate yet interrelated statistical models for each component (head motion, gaze, or eyelid motion) from a prerecorded facial motion data set: 1) Gaussian Mixture Models and gradient descent optimization algorithm are employed to generate head motion from speech features; 2) Nonlinear Dynamic Canonical Correlation Analysis model is used to synthesize eye gaze from head motion and speech features, and 3) nonnegative linear regression is used to model voluntary eye lid motion and log-normal distribution is used to describe involuntary eye blinks. Several user studies are conducted to evaluate the effectiveness of the proposed speech-driven head and eye motion generator using the well-established paired comparison methodology. Our evaluation results clearly show that this approach can significantly outperform the state-of-the-art head and eye motion generation algorithms. In addition, a novel mocap+video hybrid data acquisition technique is introduced to record high-fidelity head movement, eye gaze, and eyelid motion simultaneously.
video signal processing, computer animation, data acquisition, eye, face recognition, Gaussian processes, gradient methods, image motion analysis, log normal distribution, optimisation, realistic images, statistical analysis, facial animation, fully automated framework, realistic head motion generation, eye gaze generation, eyelid motion generation, live speech input, live speech driven head-and-eye motion generators, statistical models, facial motion data set, Gaussian mixture models, gradient descent optimization algorithm, speech features, nonlinear dynamic canonical correlation analysis model, eye gaze synthesis, nonnegative linear regression, voluntary eye lid motion model, log-normal distribution, mocap+video hybrid data acquisition technique, high-fidelity head movement recording, eye gaze recording, eyelid motion recording, Speech, Magnetic heads, Hidden Markov models, Humans, Synchronization, Data acquisition, and live speech driven, Facial animation, head and eye motion coupling, head motion synthesis, gaze synthesis, blinking model
B. H. Le, Xiaohan Ma, Zhigang Deng, "Live Speech Driven Head-and-Eye Motion Generators", IEEE Transactions on Visualization & Computer Graphics, vol.18, no. 11, pp. 1902-1914, Nov. 2012, doi:10.1109/TVCG.2012.74