|
| This Article | ||
| ||
| Share | ||
| Bibliographic References | ||
| Add to: | ||
| | ||
| Search | ||
| ||
| ASCII Text | x | ||
| Mehmet E. Sargin, Yucel Yemez, Engin Erzin, Ahmet M. Tekalp, "Analysis of Head Gesture and Prosody Patterns for Prosody-Driven Head-Gesture Animation," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 30, no. 8, pp. 1330-1345, August, 2008. | |||
| BibTex | x | ||
| @article{ 10.1109/TPAMI.2007.70797, author = {Mehmet E. Sargin and Yucel Yemez and Engin Erzin and Ahmet M. Tekalp}, title = {Analysis of Head Gesture and Prosody Patterns for Prosody-Driven Head-Gesture Animation}, journal ={IEEE Transactions on Pattern Analysis and Machine Intelligence}, volume = {30}, number = {8}, issn = {0162-8828}, year = {2008}, pages = {1330-1345}, doi = {http://doi.ieeecomputersociety.org/10.1109/TPAMI.2007.70797}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, } | |||
| RefWorks Procite/RefMan/Endnote | x | ||
| TY - JOUR JO - IEEE Transactions on Pattern Analysis and Machine Intelligence TI - Analysis of Head Gesture and Prosody Patterns for Prosody-Driven Head-Gesture Animation IS - 8 SN - 0162-8828 SP1330 EP1345 EPD - 1330-1345 A1 - Mehmet E. Sargin, A1 - Yucel Yemez, A1 - Engin Erzin, A1 - Ahmet M. Tekalp, PY - 2008 KW - Audio input/output KW - Face and gesture recognition KW - Pattern analysis VL - 30 JA - IEEE Transactions on Pattern Analysis and Machine Intelligence ER - | |||
[1] T. Chen, “Audiovisual Speech Processing,” IEEE Signal Processing Magazine, vol. 18, pp. 9-21, 2001.
[2] S. Morishima, K. Aizawa, and H. Harashima, “An Intelligent Facial Image Coding Driven by Speech and Phoneme,” Proc. Int'l Conf. Acoustics, Speech, and Signal Processing (ICASSP '89), pp.1795-1798, 1989.
[3] C. Bregler, M. Covell, and M. Slaney, “Video Rewrite: Driving Visual Speech with Audio,” Proc. ACM SIGGRAPH '97, pp. 353-360, 1997.
[4] F. Huang and T. Chen, “Real-Time Lip-Synch Face Animation Driven by Human Voice,” Proc. IEEE Second Workshop Multimedia Signal Processing, pp. 352-357, 1998.
[5] E. Yamamoto, S. Nakamura, and K. ShiKano, “Lip Movement Synthesis from Speech Based on Hidden Markov Models,” Speech Comm., pp. 105-115, 1998.
[6] M. Brand, “Voice Puppetry,” Proc. 26th Ann. Conf. Computer Graphics and Interactive Techniques, pp. 21-28, 1999.
[7] P.S. Aleksic and A.K. Katsaggelos, “Speech-to-Video Synthesis Using Facial Animation Parameters,” IEEE Trans. Circuits and Systems for Video Technology, vol. 14, no. 5, pp. 682-692, 2004.
[8] Y. Li and H.-Y. Shum, “Learning Dynamic Audio-Visual Mapping with Inputoutput Hidden Markov Models,” IEEE Trans. Multimedia, vol. 8, no. 3, pp. 542-549, 2006.
[9] J. Xue, J. Borgstrom, J. Jiang, L. Bernstein, and A. Alwan, “Acoustically-Driven Talking Face Synthesis Using Dynamic Bayesian Networks,” Proc. Int'l Conf. Multimedia and Expo (ICME '06), pp. 1165-1168, 2006.
[10] L. Valbonesi, R. Ansari, D. McNeill, F. Quek, S. Duncan, K.E. McCullough, and R. Bryll, “Multimodal Signal Analysis of Prosody and Hand Motion: Temporal Correlation of Speech and Gestures,” Proc. European Signal Processing Conf. (EUSIPCO '02), vol. 1, pp. 75-78, 2002.
[11] K. Munhall, J.A. Jones, D.E. Callan, T. Kuratate, and E. Vatikiotis-Bateson, “Visual Prosody and Speech Intelligibility: Head Movement Improves Auditory Speech Perception,” Psychological Science, vol. 15, no. 2, pp. 133-137, 2004.
[12] F. Quek, D. McNeill, R. Ansari, X. Ma, R. Bryll, S. Duncan, and K. McCullough, “Gesture Cues for Conversational Interaction in Monocular Video,” Proc. Int'l Workshop Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems, pp. 64-69, 1999.
[13] T. Kuratate, K.G. Munhall, P.E. Rubin, E. Vatikiotis-Bateson, and H. Yehia, “Audio-Visual Synthesis of Talking Faces from Speech Production Correlates,” Proc. Sixth European Conf. Speech Comm. and Technology (EUROSPEECH '99), pp. 1279-1282, 1999.
[14] H.P. Graf, E. Cosatto, V. Strom, and F.J. Huang, “Visual Prosody: Facial Movements Accompanying Speech,” Proc. IEEE Int'l Conf. Automatic Face and Gesture Recognition, pp. 381-386, 2002.
[15] E. Chuang and C. Bregler, “Mood Swings: Expressive Speech Animation,” ACM Trans. Graphics, vol. 24, no. 2, pp. 331-347, 2005.
[16] Z. Deng, C. Busso, S. Narayanan, and U. Neumann, “Audio-Based Head Motion Synthesis for Avatar-Based Telepresence Systems,” Proc. ACM SIGMM Workshop Effective Telepresence (ETP '04), pp.24-30, 2004.
[17] M.E. Sargin, F. Ofli, Y. Yasinnik, O. Aran, A. Karpov, S. Wilson, E. Erzin, Y. Yemez, and A.M. Tekalp, “Gesture-Speech Correlation Analysis and Speech Driven Gesture Synthesis,” Proc. Int'l Conf. Multimedia and Expo (ICME '06), 2006.
[18] M. Naphade and T. Huang, “Discovering Recurrent Events in Video Using Unsupervised Methods,” Proc. Int'l Conf. Image Processing (ICIP '02), 2, pp. 13-16, 2002.
[19] P. Viola and M. Jones, “Rapid Object Detection Using a Boosted Cascade of Simple Features,” Proc. IEEE Computer Vision and Pattern Recognition (CVPR '01), pp. 511-518, 2001.
[20] R. Lienhart and J. Maydt, “An Extended Set of Haar-Like Features for Rapid Object Detection,” Proc. Int'l Conf. Image Processing (ICIP '02), vol. 1, pp. 900-903, 2002.
[21] J.Y. Bouguet, Pyramidal Implementation of the Lucas Kanade Feature Trackerdescription of the Algorithm, OpenCVDocuments, Intel Corp., Microprocessor Research Labs, 1999.
[22] M. Brown, D. Burschka, and G. Hager, “Advances in Computational Stereo,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 25, no. 8, pp. 993-1008, Aug. 2003.
[23] P. Fua, “Combining Stereo and Monocular Information to Compute Dense Depth Maps that Preserve Depth Discontinuities,” Proc. 12th Int'l Joint Conf. Artificial Intelligence, pp. 1292-1298, 1997.
[24] D. Varshalovich, A. Moskalev, and V. Khersonskii, “Description of Rotation in Terms of the Euler Angles,” Quantum Theory of Angular Momentum, World Scientific, 1988.
[25] K. Shoemake, “Animating Rotation with Quaternion Curves,” Proc. 12th Ann. Conf. Computer Graphics and Interactive Techniques, pp. 245-254, 1985.
[26] P. Boersma, “Accurate Short-Term Analysis of the Fundamental Frequency and the Harmonics-to-Noise Ratio of a Sampled Sound,” Proc. Inst. Phonetic Sciences, vol. 17, pp. 97-110, 1993.
[27] S. Ananthakrishnan and S. Narayanan, “An Automatic Prosody Recognizer Using a Coupled Multi-Stream Acoustic Model and a Syntactic-Prosodic Language Model,” Proc. IEEE Int'l Conf. Acoustics, Speech, and Signal Processing (ICASSP '05), vol. 1, 2005.
[28] Point Grey Research Inc., http:/www.ptgrey.com/, 2008.
[29] K. Silverman, M. Beckman, J. Pitrelli, M. Ostendorf, C. Wightman, P. Price, J. Pierrehumbert, and J. Hirschberg, “Tobi: A Standard for Labeling English Prosody,” Proc. Int'l Conf. Spoken Language Processing (ICSLP '92), pp. 867-870, 1992.
[30] Momentum Inc., Speech-Driven Talking Head Avatar, http:/www.momentum-dmt.com/, 2008.
[31] Y. Bengio and P. Frasconi, “Input-Output HMMs for Sequence Processing,” IEEE Trans. Neural Networks, vol. 7, no. 5, pp. 1231-1249, 1996.
[32] R. Collobert, S. Bengio, and J. Mariethoz, “Torch: A Modular Machine Learning Software Library,” IDIAP Research Report, vol. 2, p. 46, 2002.
[33] Prosody-Driven Head Gesture Animation, http://mvgl.ku.edu.trprosodygesture/, 2008.
[34] J.H. Manton, “Optimisation Algorithms Exploiting Unitary Constraints,” IEEE Trans. Signal Processing, vol. 50, no. 3, pp. 635-650, Mar. 2002.
[35] D. Demirdjian and T. Darrell, “Motion Estimation from Disparity Images,” Proc. Eighth IEEE Int'l Conf. Computer Vision, vol. 1, pp.213-218, 2001.

