Issue No. 05 - September-October (1998 vol. 18)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/38.708562
We present a method for estimating 3D motion from 2D image sequences showing head and shoulder scenes typical of video telephone and teleconferencing applications. Our 3D model specifies the color and shape of the person in the video. Additionally, the model constrains facial motion and deformation to a set of facial expressions represented by the facial animation parameters (FAPs) defined by the MPEG-4 standard. Using this model, we obtain a description of both global and local 3D head motion as a function of the unknown facial parameters. Combining the 3D information with the optical flow constraint leads to a robust and linear algorithm that estimates the facial animation parameters from two successive frames with low computational complexity. Experimental results on synthetic and real data confirm the technique's applicability and show that image sequences of head and shoulder scenes can be encoded at bit rates below 0.6 Kbits.
model-based video coding, 3D head model, MPEG-4, facial expression analysis, virtual conferencing.
P. Eisert and B. Girod, "Analyzing Facial Expressions for Virtual Conferencing," in IEEE Computer Graphics and Applications, vol. 18, no. , pp. 70-78, 1998.