The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.02 - April-June (2013 vol.4)
pp: 183-196
S. Mariooryad , Multimodal Signal Process. Lab., Univ. of Texas at Dallas, Richardson, TX, USA
C. Busso , Multimodal Signal Process. Lab., Univ. of Texas at Dallas, Richardson, TX, USA
ABSTRACT
Psycholinguistic studies on human communication have shown that during human interaction individuals tend to adapt their behaviors mimicking the spoken style, gestures, and expressions of their conversational partners. This synchronization pattern is referred to as entrainment. This study investigates the presence of entrainment at the emotion level in cross-modality settings and its implications on multimodal emotion recognition systems. The analysis explores the relationship between acoustic features of the speaker and facial expressions of the interlocutor during dyadic interactions. The analysis shows that 72 percent of the time the speakers displayed similar emotions, indicating strong mutual influence in their expressive behaviors. We also investigate the cross-modality, cross-speaker dependence, using mutual information framework. The study reveals a strong relation between facial and acoustic features of one subject with the emotional state of the other subject. It also shows strong dependence between heterogeneous modalities across conversational partners. These findings suggest that the expressive behaviors from one dialog partner provide complementary information to recognize the emotional state of the other dialog partner. The analysis motivates classification experiments exploiting cross-modality, cross-speaker information. The study presents emotion recognition experiments using the IEMOCAP and SEMAINE databases. The results demonstrate the benefit of exploiting this emotional entrainment effect, showing statistically significant improvements.
INDEX TERMS
Emotion recognition, Acoustics, Feature extraction, Mutual information, Speech, Databases, Facial features,emotionally expressive speech, Entrainment, multimodal interaction, cross-subject multimodal emotion recognition, facial expressions
CITATION
S. Mariooryad, C. Busso, "Exploring Cross-Modality Affective Reactions for Audiovisual Emotion Recognition", IEEE Transactions on Affective Computing, vol.4, no. 2, pp. 183-196, April-June 2013, doi:10.1109/T-AFFC.2013.11
REFERENCES
[1] S. Brennan, "Lexical Entrainment in Spontaneous Dialog," Proc. Int'l Symp. Spoken Dialogue (ISSD '96), pp. 41-44, Oct. 1996.
[2] M.E. Babel, "Phonetic and Social Selectivity in Speech Accommodation," PhD dissertation, Dept. of Linguistics, Univ. of California Berkeley, 2009.
[3] R. Levitan and J. Hirschberg, "Measuring Acoustic-Prosodic Entrainment with Respect to Multiple Levels and Dimensions," Proc. 12th Ann. Conf. Int'l Speech Comm. Assoc. (Interspeech '11), pp. 3081-3084, Aug. 2011.
[4] M. Natale, "Convergence of Mean Vocal Intensity in Dyadic Communication as a Function of Social Desirability," J. Personality and Social Psycholoxy, vol. 32, no. 5, pp. 790-804, Nov. 1975.
[5] R. Coulston, S. Oviatt, and C. Darves, "Amplitude Convergence in Children's Conversational Speech with Animated Personas," Proc. Int'l Conf. Spoken Language Processing (ICSLP '02), vol. 4, pp. 2689-2692, Sept. 2002.
[6] C. Breazeal, "Regulation and Entrainment in Human-Robot Interaction," Int'l J. Robotics Research, vol. 21, nos. 10/11, pp. 883-902, Oct./Nov. 2002.
[7] L. Mol, E. Krahmer, and M. Swerts, "Alignment in Iconic Gestures: Does It Make Sense?" Proc. Int'l Conf. Auditory-Visual Speech Processing (AVSP '09), pp. 3-8, Sept. 2009.
[8] T.L. Chartrand and J.A. Bargh, "The Chameleon Effect: The Perception-Behavior Link and Social Interaction," J. Personality and Social Psychology, vol. 76, no. 6, pp. 893-910, June 1999.
[9] R. Levitan, A. Gravano, and J. Hirschberg, "Entrainment in Speech Preceding Backchannels," Proc. 49th Ann. Meeting of the Assoc. for Computational Linguistics: Human Language Technologies (ACL HLT '09), vol. 2, pp. 113-117, June 2009.
[10] L. Bell, J. Gustafson, and M. Heldner, "Prosodic Adaptation in Human-Computer Interaction," Proc. 15th Int'l Congress of Phonetic Sciences (ICPhS '03), pp. 2453-2456, Aug. 2003.
[11] R. Porzel, A. Scheffler, and R. Malaka, "How Entrainment Increases Dialogical Effectiveness," Proc. Int'l Conf. Intelligent User Interfaces (IUI '06), Workshop Effective Multimodal Dialogue Interfaces, Jan. 2006.
[12] P.A. Andersen and L.K. Guerrero, Handbook of Communication and Emotion: Research, Theory, Applications, and Contexts. Academic Press, Oct. 1997.
[13] H. Giles, J. Coupland, and N. Coupland, Contexts of Accommodation: Developments in Applied Sociolinguistics. Cambridge Univ. Press, Sept. 1991.
[14] R.A. Hinde, Towards Understanding Relationships. Academic Press, Dec. 1979.
[15] J. Burgoon, L. Stern, and L. Dillman, Interpersonal Adaptation: Dyadic Interaction Patterns. Cambridge Univ. Press, Oct. 1995.
[16] T. Iio, M. Shiomi, K. Shinozawa, T. Miyashita, T. Akimoto, and N. Hagita, "Lexical Entrainment in Human-Robot Interaction: Can Robots Entrain Human Vocabulary?" Proc. IEEE/RSJ Int'l Conf. Intelligent Robots and Systems (IROS '09), pp. 3727-3734, Oct. 2009.
[17] T. Kanda, H. Ishiguro, M. Imai, and T. Ono, "Development and Evaluation of Interactive Humanoid Robots," Proc. IEEE, vol. 92, no. 11, pp. 1839-1850, Nov. 2004.
[18] S. Yildirim, M. Bulut, C. Lee, A. Kazemzadeh, C. Busso, Z. Deng, S. Lee, and S. Narayanan, "An Acoustic Study of Emotions Expressed in Speech," Proc. Eighth Int'l Conf. Spoken Language Processing (ICSLP '04), pp. 2193-2196, Oct. 2004.
[19] P. Ekman and E. Rosenberg, What the Face Reveals: Basic and Applied Studies of Spontaneous Expression Using the Facial Action Coding System (FACS '97). Oxford Univ. Press, 1997.
[20] C.-C. Lee, M. Black, A. Katsamanis, A. Lammert, B. Baucom, A. Christensen, P. Georgiou, and S. Narayanan, "Quantification of Prosodic Entrainment in Affective Spontaneous Spoken Interactions of Married Couples," Proc. Ann. Conf. Int'l Speech Comm. Assoc. (Interspeech '10), pp. 793-796, Sept. 2010.
[21] C.-C. Lee, C. Busso, S. Lee, and S. Narayanan, "Modeling Mutual Influence of Interlocutor Emotion States in Dyadic Spoken Interactions," Proc. Ann. Conf. Int'l Speech Comm. Assoc. (Interspeech '09), pp. 1983-1986, Sept. 2009.
[22] A. Metallinou, A. Katsamanis, and S. Narayanan, "A Hierarchical Framework for Modeling Multimodality and Emotional Evolution in Affective Dialogs," Proc. Int'l Conf. Acoustics, Speech, and Signal Processing (ICASSP '12), pp. 2401-2404, Mar. 2012.
[23] C. Busso, M. Bulut, C. Lee, A. Kazemzadeh, E. Mower, S. Kim, J. Chang, S. Lee, and S. Narayanan, "IEMOCAP: Interactive Emotional Dyadic Motion Capture Database," J. Language Resources and Evaluation, vol. 42, no. 4, pp. 335-359, Dec. 2008.
[24] C. Busso and S. Narayanan, "Recording Audio-Visual Emotional Databases from Actors: A Closer Look," Proc. Second Int'l Workshop Emotion: Corpora for Research on Emotion and Affect, Int'l Conf. Language Resources and Evaluation (LREC '08), pp. 17-22, May 2008.
[25] E. Mower, A. Metallinou, C.-C. Lee, A. Kazemzadeh, C. Busso, S. Lee, and S. Narayanan, "Interpreting Ambiguous Emotional Expressions," Proc. Int'l Conf. Affective Computing and Intelligent Interaction (ACII '09), Sept. 2009.
[26] S. Mariooryad and C. Busso, "Factorizing Speaker, Lexical and Emotional Variabilities Observed in Facial Expressions," Proc. IEEE Int'l Conf. Image Processing (ICIP '12), pp. 2605-2608, Sept.-Oct. 2012.
[27] A. Metallinou, C. Busso, S. Lee, and S. Narayanan, "Visual Emotion Recognition Using Compact Facial Representations and Viseme Information," Proc. Int'l Conf. Acoustics, Speech, and Signal Processing (ICASSP '10), pp. 2474-2477, Mar. 2010.
[28] C. Busso and S. Narayanan, "Interplay between Linguistic and Affective Goals in Facial Expression during Emotional Utterances," Proc. Seventh Int'l Seminar Speech Production (ISSP '06), pp. 549-556, Dec. 2006.
[29] C. Busso and S. Narayanan, "Interrelation between Speech and Facial Gestures in Emotional Utterances: A Single Subject Study," IEEE Trans. Audio, Speech and Language Processing, vol. 15, no. 8, pp. 2331-2347, Nov. 2007.
[30] S. Mariooryad and C. Busso, "Feature and Model Level Compensation of Lexical Content for Facial Emotion Recognition," Proc. IEEE Int'l Conf. Automatic Face and Gesture Recognition (FG '13), Apr. 2013.
[31] M.A. Hall, "Correlation based Feature-Selection for Machine Learning," PhD dissertation, The Univ. of Waikato, Apr. 1999.
[32] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. Witten, "The WEKA Data Mining Software: An Update," ACM SIGKDD Explorations Newsletter, vol. 11, no. 1, pp. 10-18, June 2009.
[33] B. Schuller, S. Steidl, A. Batliner, F. Schiel, and J. Krajewski, "The INTERSPEECH 2011 Speaker State Challenge," Proc. 12th Ann. Conf. Int'l Speech Comm. Assoc. (Interspeech '11), Aug. 2011.
[34] H. Hermansky, "Perceptual Linear Predictive (PLP) Analysis of Speech," J. Acoustical Soc. of Am., vol. 87, no. 4, pp. 1738-1752, Apr. 1990.
[35] F. Eyben, M. Wöllmer, and B. Schuller, "OpenSMILE: The Munich Versatile and Fast Open-Source Audio Feature Extractor," Proc. ACM Int'l Conf. Multimedia (MM '10), pp. 1459-1462, Oct. 2010.
[36] A. Metallinou, S. Lee, and S. Narayanan, "Decision Level Combination of Multiple Modalities for Recognition and Analysis of Emotional Expression," Proc. Int'l Conf. Acoustics, Speech, and Signal Processing (ICASSP '10), pp. 2462-2465, Mar. 2010.
[37] C. Busso, Z. Deng, S. Yildirim, M. Bulut, C. Lee, A. Kazemzadeh, S. Lee, U. Neumann, and S. Narayanan, "Analysis of Emotion Recognition Using Facial Expressions, Speech and Multimodal Information," Proc. Sixth Int'l Conf. Multimodal Interfaces (ICMI '04), pp. 205-211, Oct. 2004.
[38] C. Lee, S. Yildirim, M. Bulut, A. Kazemzadeh, C. Busso, Z. Deng, S. Lee, and S. Narayanan, "Emotion Recognition Based on Phoneme Classes," Proc. Eighth Int'l Conf. Spoken Language Processing (ICSLP '04), pp. 889-892, Oct. 2004.
[39] C. Busso and T. Rahman, "Unveiling the Acoustic Properties That Describe the Valence Dimension," Proc. Ann. Conf. Int'l Speech Comm. Assoc. (Interspeech), Sept. 2012.
[40] G. McKeown, M. Valstar, R. Cowie, M. Pantic, and M. Schröder, "The SEMAINE Database: Annotated Multimodal Records of Emotionally Colored Conversations between a Person and a Limited Agent," IEEE Trans. Affective Computing, vol. 3, no. 1, pp. 5-17, Jan.-Mar. 2012.
[41] E. Douglas-Cowie, R. Cowie, C. Cox, N. Amir, and D. Heylen, "The Sensitive Artificial Listener: An Induction Technique for Generating Emotionally Coloured Conversation," Proc. Second Int'l Workshop Emotion: Corpora for Research on Emotion and Affect, Int'l Conf. Language Resources and Evaluation (LREC '08), pp. 1-8, May 2008.
[42] M. Bartlett, G. Littlewort, M. Frank, C. Lainscsek, I. Fasel, and J. Movellan, "Automatic Recognition of Facial Actions in Spontaneous Expressions," J. Multimedia, vol. 1, pp. 22-35, Sept. 2006.
[43] P. Ekman and W. Friesen, Facial Action Coding System: A Technique for Measurement of Facial Movement. Consulting Psychologists Press, 1978.
[44] I. Pandzic and R. Forchheimer, MPEG-4 Facial Animation—The Standard, Implementations and Applications. John Wiley & Sons, Nov. 2002.
[45] R. Cowie, E. Douglas-Cowie, S. Savvidou, E. McMahon, M. Sawey, and M. Schröder, "'FEELTRACE': An Instrument for Recording Perceived Emotion in Real Time," Proc. ISCA Tutorial and Research Workshop on Speech and Emotion, pp. 19-24, Sept. 2000.
[46] J. Nicolle, V. Rapp, K. Bailly, L. Prevost, and M. Chetouani, "Robust Continuous Prediction of Human Emotions Using Multiscale Dynamic Cues," Proc. Int'l Conf. Multimodal Interaction (ICMI '12), pp. 501-508, Oct. 2012.
[47] A. Metallinou, M. Wöllmer, A. Katsamanis, F. Eyben, B. Schuller, and S. Narayanan, "Context-Sensitive Learning for Enhanced Audiovisual Emotion Classification," IEEE Trans. Affective Computing, vol. 3, no. 2, pp. 184-198, Apr.-June 2012.
[48] J. Jeon, R. Xia, and Y. Liu, "Sentence Level Emotion Recognition Based on Decisions from Subsentence Segments," Proc. Int'l Conf. Acoustics, Speech, and Signal Processing (ICASSP '11), pp. 4940-4943, May 2011.
22 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool