Issue No. 02 - April-June (2012 vol. 3)
F. Eyben , Inst. for Human-Machine Commun., Tech. Univ. Munchen, Munchen, Germany
A. Katsamanis , Dept. of Electr. Eng., Univ. of Southern California, Los Angeles, CA, USA
M. Wollmer , Inst. for Human-Machine Commun., Tech. Univ. Munchen, Munchen, Germany
A. Metallinou , Dept. of Electr. Eng., Univ. of Southern California, Los Angeles, CA, USA
B. Schuller , Inst. for Human-Machine Commun., Tech. Univ. Munchen, Munchen, Germany
S. Narayanan , Dept. of Electr. Eng., Univ. of Southern California, Los Angeles, CA, USA
Human emotional expression tends to evolve in a structured manner in the sense that certain emotional evolution patterns, i.e., anger to anger, are more probable than others, e.g., anger to happiness. Furthermore, the perception of an emotional display can be affected by recent emotional displays. Therefore, the emotional content of past and future observations could offer relevant temporal context when classifying the emotional content of an observation. In this work, we focus on audio-visual recognition of the emotional content of improvised emotional interactions at the utterance level. We examine context-sensitive schemes for emotion recognition within a multimodal, hierarchical approach: bidirectional Long Short-Term Memory (BLSTM) neural networks, hierarchical Hidden Markov Model classifiers (HMMs), and hybrid HMM/BLSTM classifiers are considered for modeling emotion evolution within an utterance and between utterances over the course of a dialog. Overall, our experimental results indicate that incorporating long-term temporal context is beneficial for emotion recognition systems that encounter a variety of emotional manifestations. Context-sensitive approaches outperform those without context for classification tasks such as discrimination between valence levels or between clusters in the valence-activation space. The analysis of emotional transitions in our database sheds light into the flow of affective expressions, revealing potentially useful patterns.
neural nets, emotion recognition, hidden Markov models, learning (artificial intelligence), valence-activation space, context-sensitive learning, audiovisual emotion classification, human emotional expression, emotional display, improvised emotional interactions, emotion recognition, bidirectional long short-term memory neural networks, hierarchical hidden Markov model classifiers, hybrid HMM/BLSTM classifiers, Hidden Markov models, Context, Emotion recognition, Viterbi algorithm, Context modeling, Logic gates, Recurrent neural networks, emotional grammars., Audio-visual emotion recognition, temporal context, Hidden Markov models, bidirectional long short term memory, recurrent neural networks
F. Eyben, A. Katsamanis, M. Wollmer, A. Metallinou, B. Schuller and S. Narayanan, "Context-Sensitive Learning for Enhanced Audiovisual Emotion Classification," in IEEE Transactions on Affective Computing, vol. 3, no. , pp. 184-198, 2012.