loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Continuous Prediction of Spontaneous Affect from Multiple Cues and Modalities in Valence-Arousal Space
April-June 2011 (vol. 2 no. 2)
pp. 92-105
Mihalis A. Nicolaou, Imperial College London, London
Hatice Gunes, Imperial College London, London
Maja Pantic, Imperial College London, London and University of Twente, The Netherlands
Past research in analysis of human affect has focused on recognition of prototypic expressions of six basic emotions based on posed data acquired in laboratory settings. Recently, there has been a shift toward subtle, continuous, and context-specific interpretations of affective displays recorded in naturalistic and real-world settings, and toward multimodal analysis and recognition of human affect. Converging with this shift, this paper presents, to the best of our knowledge, the first approach in the literature that: 1) fuses facial expression, shoulder gesture, and audio cues for dimensional and continuous prediction of emotions in valence and arousal space, 2) compares the performance of two state-of-the-art machine learning techniques applied to the target problem, the bidirectional Long Short-Term Memory neural networks (BLSTM-NNs), and Support Vector Machines for Regression (SVR), and 3) proposes an output-associative fusion framework that incorporates correlations and covariances between the emotion dimensions. Evaluation of the proposed approach has been done using the spontaneous SAL data from four subjects and subject-dependent leave-one-sequence-out cross validation. The experimental results obtained show that: 1) on average, BLSTM-NNs outperform SVR due to their ability to learn past and future context, 2) the proposed output-associative fusion framework outperforms feature-level and model-level fusion by modeling and learning correlations and patterns between the valence and arousal dimensions, and 3) the proposed system is well able to reproduce the valence and arousal ground truth obtained from human coders.

[1] N. Alvarado, "Arousal and Valence in the Direct Scaling of Emotional Response to Film Clips," Motivation and Emotion, vol. 21, pp. 323-348, 1997.
[2] N. Ambady and R. Rosenthal, "Thin Slices of Expressive Behavior as Predictors of Interpersonal Consequences: A Meta-Analysis," Psychological Bull., vol. 11, no. 2, pp. 256-274, 1992.
[3] P. Baldi, S. Brunak, P. Frasconi, G. Pollastri, and G. Soda, "Exploiting the Past and the Future in Protein Secondary Structure Prediction," Bioinformatics, vol. 15, pp. 937-946, 1999.
[4] S. Baron-Cohen and T.H.E. Tead, Mind Reading: The Interactive Guide to Emotion. Jessica Kingsley Publishers Ltd., 2003.
[5] S. Bermejo and J. Cabestany, "Oriented Principal Component Analysis for Large Margin Classifiers," Neural Networks, vol. 14, no. 10, pp. 1447-1461, 2001.
[6] L. Bo and C. Sminchisescu, "Twin Gaussian Processes for Structured Prediction," Int'l J. Computer Vision, vol. 87, pp. 28-52, 2010.
[7] L. Bo and C. Sminchisescu, "Structured Output-Associative Regression," Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 2403-2410, 2009.
[8] R.A. Calvo and S. DMello, "Affect Detection: An Interdisciplinary Review of Models, Methods, and Their Applications," IEEE Trans. Affective Computing, vol. 1, no. 1, pp. 18-37, Jan.-June 2010.
[9] G. Caridakis, K. Karpouzis, and S. Kollias, "User and Context Adaptive Neural Networks for Emotion Recognition," Neurocomputing, vol. 71, nos. 13-15, pp. 2553-2562, 2008.
[10] G. Caridakis, L. Malatesta, L. Kessous, N. Amir, A. Raouzaiou, and K. Karpouzis, "Modeling Naturalistic Affective States via Facial and Vocal Expressions Recognition," Proc. ACM Int'l Conf. Multimodal Interfaces, pp. 146-154, 2006.
[11] G. Chanel, K. Ansari-Asl, and T. Pun, "Valence-Arousal Evaluation Using Physiological Signals in an Emotion Recall Paradigm," Proc. IEEE Int'l Conf. Systems, Man and Cybernetics, pp. 2662-2667, 2007.
[12] G. Chanel, J. Kronegg, D. Grandjean, and T. Pun, "Emotion Assessment: Arousal Evaluation Using EEG's and Peripheral Physiological Signals," Proc. Multimedia Content Representation, Classification and Security, pp. 530-537, 2006.
[13] R. Cowie, E. Douglas-Cowie, S. Savvidou, E. McMahon, M. Sawey, and M. Schroder, "Feeltrace: An Instrument for Recording Perceived Emotion in Real Time," Proc. ISCA Workshop Speech and Emotion, pp. 19-24, 2000.
[14] R. Cowie, E. Douglas-Cowie, N. Tsapatsoulis, G. Votsis, S. Kollias, W. Fellenz, and J.G. Taylor, "Emotion Recognition in Human-Computer Interaction," IEEE Signal Processing Magazine, vol. 18, no. 1, pp. 32-80, Jan. 2001.
[15] R. Cowie, H. Gunes, G. McKeown, L. Vaclau-Schneider, J. Armstrong, and E. Douglas-Cowie, "The Emotional and Communicative Significance of Head Nods and Shakes in a Naturalistic Database," Proc. LREC Int'l Workshop Emotion, pp. 42-46, 2010.
[16] N. Cristianini and J. Shawe-Taylor, An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge Univ. Press, Mar. 2000.
[17] E. Douglas-Cowie, R. Cowie, I. Sneddon, C. Cox, L. Lowry, M. McRorie, L. Jean-Claude Martin, J.-C. Devillers, A. Abrilian, S. Batliner, A. Noam, and K. Karpouzis, "The Humaine Database: Addressing the Needs of the Affective Computing Community," Proc. Second Int'l Conf. Affective Computing and Intelligent Interaction, pp. 488-500, 2007.
[18] H. Drucker, C.J.C. Burges, L. Kaufman, A.J. Smola, and V. Vapnik, "Support Vector Regression Machines," Advances in Neural Information Processing Systems, pp. 155-161, MIT Press, 1996.
[19] P. Ekman, Emotions in the Human Faces, second ed. Cambridge Univ. Press, 1982.
[20] P. Ekman and W. Friesen, "Head and Body Cues in Gyrus and Inferior Medial Prefrontal Cortex in Social Perception," Perceptual & Motor Skills, vol. 24, pp. 711-724, 1967.
[21] K. Forbes-Riley and D. Litman, "Predicting Emotion in Spoken Dialogue from Multiple Knowledge Sources," Proc. Human Language Technology Conf. North Am. Chapter of the Assoc. Computational Linguistics, pp. 201-208, 2004.
[22] N. Fragopanagos and J.G. Taylor, "Emotion Recognition in Human-Computer Interaction," Neural Networks, vol. 18, no. 4, pp. 389-405, 2005.
[23] D. Glowinski, A. Camurri, G. Volpe, N. Dael, and K. Scherer, "Technique for Automatic Emotion Recognition by Body Gesture Analysis," Proc. IEEE CS Conf. Computer Vision and Pattern Recognition Workshops, pp. 1-6, 2008.
[24] D. Grandjean, D. Sander, and K.R. Scherer, "Conscious Emotional Experience Emerges as a Function of Multilevel, Appraisal-Driven Response Synchronization," Consciousness and Cognition, vol. 17, pp. 484-495, 2008.
[25] A. Graves and J. Schmidhuber, "Framewise Phoneme Classification with Bidirectional LSTM and Other Neural Network Architectures," Neural Networks, vol. 18, pp. 602-610, 2005.
[26] M. Grimm and K. Kroschel, "Emotion Estimation in Speech Using a 3D Emotion Space Concept," Proc. IEEE Automatic Speech Recognition and Understanding Workshop, pp. 381-385, 2005.
[27] H. Gunes and M. Pantic, "Automatic, Dimensional and Continuous Emotion Recognition," Int. J. Synthetic Emotions, vol. 1, no. 1, pp. 68-99, 2010.
[28] H. Gunes and M. Pantic, "Dimensional Emotion Prediction from Spontaneous Head Gestures for Interaction with Sensitive Artificial Listeners," Proc. Int'l Conf. Intelligent Virtual Agents, pp. 371-377, 2010.
[29] H. Gunes, M. Piccardi, and M. Pantic, "From the Lab to the Real World: Affect Recognition Using Multiple Cues and Modalities" Affective Computing: Focus on Emotion Expression, Synthesis, and Recognition, pp. 185-218, I-Tech Education and Publishing, 2008.
[30] S. Hochreiter, "Untersuchungen zu Dynamischen Neuronalen Netzen," diploma thesis, Institut Für Informatik, Lehrstuhl Prof. Brauer, Technische Universität München, 1991.
[31] S. Hochreiter, "The Vanishing Gradient Problem during Learning Recurrent Neural Nets and Problem Solutions," Int'l J. Uncertainty, Fuzziness and Knowledge-Based Systems, vol. 6, no. 2, pp. 107-116, 1998.
[32] S. Ioannou, A. Raouzaiou, V. Tzouvaras, T. Mailis, K. Karpouzis, and S. Kollias, "Emotion Recognition through Facial Expression Analysis Based on a Neurofuzzy Method," J. Neural Networks, vol. 18, no. 4, pp. 423-435, 2005.
[33] D. Jurafsky and J.H. Martin, Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition, second ed. Prentice Hall, 2008.
[34] I. Kanluan, M. Grimm, and K. Kroschel, "Audio-Visual Emotion Recognition Using an Emotion Recognition Space Concept," Proc. 16th European Signal Processing Conf., 2008.
[35] K. Karpouzis, G. Caridakis, L. Kessous, N. Amir, A. Raouzaiou, L. Malatesta, and S. Kollias, "Modeling Naturalistic Affective States via Facial, Vocal and Bodily Expressions Recognition," Lecture Notes in Artificial Intelligence, vol. 4451, pp. 92-116, 2007.
[36] J. Kim, "Bimodal Emotion Recognition Using Speech and Physiological Changes," Robust Speech Recognition and Understanding, pp. 265-280, I-Tech Education and Publishing, 2007.
[37] A. Kleinsmith and N. Bianchi-Berthouze, "Recognizing Affective Dimensions from Body Posture," Proc. Int'l Conf. Affective Computing and Intelligent Interaction, pp. 48-58, 2007.
[38] D. Kulic and E.A. Croft, "Affective State Estimation for Human-Robot Interaction," IEEE Trans. Robotics, vol. 23, no. 5, pp. 991-1000, Oct. 2007.
[39] R. Lane and L. Nadel, Cognitive Neuroscience of Emotion. Oxford Univ. Press, 2000.
[40] R. Levenson, "Emotion and the Autonomic Nervous System: A Prospectus for Research on Autonomic Specificity," Social Psychophysiology and Emotion: Theory and Clinical Applications, pp. 17-42, John Wiley & Sons, 1988.
[41] P.A. Lewis, H.D. Critchley, P. Rotshtein, and R.J. Dolan, "Neural Correlates of Processing Valence and Arousal in Affective Words," Cerebral Cortex, vol. 17, no. 3, pp. 742-748, 2007.
[42] M. Wöllmer, F. Eyben, S. Reiter, B. Schuller, C. Cox, E. Douglas-Cowie, and R. Cowie, "Abandoning Emotion Classes—Towards Continuous Emotion Recognition with Modelling of Long-Range Dependencies," Proc. Ninth Interspeech Conf., pp. 597-600, 2008.
[43] G. McKeown, M. Valstar, R. Cowie, and M. Pantic, "The Semaine Corpus of Emotionally Coloured Character Interactions," Proc. IEEE Int'l Conf. Multimedia and Expo, pp. 1079-1084, 2010.
[44] A. Mehrabian and J. Russell, An Approach to Environmental Psychology. MIT Press, 1974.
[45] M. Nicolaou, H. Gunes, and M. Pantic, "Audio-Visual Classification and Fusion of Spontaneous Affective Data in Likelihood Space," Proc. IEEE Int'l Conf. Pattern Recognition, pp. 3695-3699, 2010.
[46] M. Nicolaou, H. Gunes, and M. Pantic, "Automatic Segmentation of Spontaneous Data Using Dimensional Labels from Multiple Coders," Proc. LREC Int'l Workshop Multimodal Corpora: Advances in Capturing, Coding and Analyzing Multimodality, pp. 43-48, 2010.
[47] M. Nicolaou, H. Gunes, and M. Pantic, "Output-Associative RVM Regression for Dimensional and Continuous Emotion Prediction," Proc. IEEE Int'l Conf. Automatic Face and Gesture Recognition, 2011.
[48] A.M. Oliveira, M.P. Teixeira, I.B. Fonseca, and M. Oliveira, "Joint Model-Parameter Validation of Self-Estimates of Valence and Arousal: Probing a Differential-Weighting Model of Affective Intensity," Proc. 22nd Ann. Meeting Int'l Soc. for Psychophysics, pp. 245-250, 2006.
[49] M. Pantic and L. Rothkrantz, "Toward an Affect Sensitive Multimodal Human-Computer Interaction," Proc. IEEE, vol. 91, no. 9, pp. 1370-1390, Sept. 2003.
[50] I. Patras and M. Pantic, "Particle Filtering with Factorized Likelihoods for Tracking Facial Features," Proc. Int'l Conf. Automatic Face and Gesture Recognition, pp. 97-104, 2004.
[51] B. Paul, "Accurate Short-Term Analysis of the Fundamental Frequency and the Harmonics-to-Noise Ratio of a Sampled Sound," Proc. Inst. of Phonetic Sciences, pp. 97-110, 1993.
[52] S. Petridis, H. Gunes, S. Kaltwang, and M. Pantic, "Static versus Dynamic Modeling of Human Nonverbal Behavior from Multiple Cues and Modalities," Proc. ACM Int'l Conf. Multimodal Interfaces, pp. 23-30, 2009.
[53] M.K. Pitt and N. Shephard, "Filtering via Simulation: Auxiliary Particle Filters," J. Am. Statistical Assoc., vol. 94, no. 446, pp. 590-616, 1999.
[54] J.A. Russell, "A Circumplex Model of Affect," J. Personality and Social Psychology, vol. 39, pp. 1161-1178, 1980.
[55] K. Scherer, A. Schorr, and T. Johnstone, Appraisal Processes in Emotion: Theory, Methods, Research. Oxford Univ. Press, 2001.
[56] K. Scherer, "Psychological Models of Emotion," The Neuropsychology of Emotion, pp. 137-162, Oxford Univ. Press, 2000.
[57] B. Schölkopf and A.J. Smola, Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. The MIT Press, 2001.
[58] M. Schuster and K.K. Paliwal, "Bidirectional Recurrent Neural Networks," IEEE Trans. Signal Processing, vol. 45, no. 11, pp. 2673-2681, Nov. 1997.
[59] K.P. Truong, D.A. Leeuwen van, M.A. Neerincx, and F.M. Jong de, "Arousal and Valence Prediction in Spontaneous Emotional Speech: Felt versus Perceived Emotion," Proc. Ann. Conf. Int'l Speech Comm. Assoc., pp. 2027-2030, 2009.
[60] D. Vrakas and I.P. Vlahavas, Artificial Intelligence for Advanced Problem Solving Techniques. IGI Global Snippet, 2008.
[61] D. Vukadinovic and M. Pantic, "Fully Automatic Facial Feature Point Detection Using Gabor Feature Based Boosted Classifiers," Proc. IEEE Int'l Conf. Systems, Man, and Cybernetics, vol. 2, pp. 1692-1698, 2005.
[62] J. Wagner, J. Kim, and E. Andre, "From Physiological Signals to Emotions: Implementing and Comparing Selected Methods for Feature Extraction and Classification," Proc. IEEE Int'l Conf. Multimedia and Expo, pp. 940-943, 2005.
[63] J. Weston, O. Chapelle, A. Elisseeff, B. Schölkopf, and V. Vapnik, "Kernel Dependency Estimation," Technical Report 98, Aug. 2002.
[64] M. Wöllmer, B. Schuller, F. Eyben, and G. Rigoll, "Combining Long Short-Term Memory and Dynamic Bayesian Networks for Incremental Emotion-Sensitive Artificial Listening," IEEE J. Selected Topics in Signal Processing, vol. 4, no. 5, pp. 867-881, Oct. 2010.
[65] Y.-H. Yang, Y.-C. Lin, Y.-F. Su, and H.H. Chen, "Music Emotion Classification: A Regression Approach," Proc. IEEE Int'l Conf. Multimedia and Expo, pp. 208-211, 2007.
[66] C. Yu, P.M. Aoki, and A. Woodruff, "Detecting User Engagement in Everyday Conversations," Proc. Eighth Int'l Conf. Spoken Language Processing, 2004.
[67] Z. Zeng, M. Pantic, G. Roisman, and T. Huang, "A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 31, no. 1, pp. 39-58, Jan. 2009.

Index Terms:
Dimensional affect recognition, continuous affect prediction, valence and arousal dimensions, facial expressions, shoulder gestures, emotional acoustic signals, multicue and multimodal fusion, output-associative fusion.
Citation:
Mihalis A. Nicolaou, Hatice Gunes, Maja Pantic, "Continuous Prediction of Spontaneous Affect from Multiple Cues and Modalities in Valence-Arousal Space," IEEE Transactions on Affective Computing, vol. 2, no. 2, pp. 92-105, April-June 2011, doi:10.1109/T-AFFC.2011.9
Usage of this product signifies your acceptance of the Terms of Use.