The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.01 - Jan.-March (2012 vol.3)
pp: 5-17
M. Valstar , Dept. of Comput., Imperial Coll. London, London, UK
G. McKeown , Sch. of Psychol., Queen's Univ. Belfast, Belfast, UK
M. Pantic , Dept. of Comput., Imperial Coll. London, London, UK
M. Schroder , Language Technol. Lab., DFKI GmbH, Saarbrucken, Germany
ABSTRACT
SEMAINE has created a large audiovisual database as a part of an iterative approach to building Sensitive Artificial Listener (SAL) agents that can engage a person in a sustained, emotionally colored conversation. Data used to build the agents came from interactions between users and an "operator” simulating a SAL agent, in different configurations: Solid SAL (designed so that operators displayed an appropriate nonverbal behavior) and Semi-automatic SAL (designed so that users' experience approximated interacting with a machine). We then recorded user interactions with the developed system, Automatic SAL, comparing the most communicatively competent version to versions with reduced nonverbal skills. High quality recording was provided by five high-resolution, high-framerate cameras, and four microphones, recorded synchronously. Recordings total 150 participants, for a total of 959 conversations with individual SAL characters, lasting approximately 5 minutes each. Solid SAL recordings are transcribed and extensively annotated: 6-8 raters per clip traced five affective dimensions and 27 associated categories. Other scenarios are labeled on the same pattern, but less fully. Additional information includes FACS annotation on selected extracts, identification of laughs, nods, and shakes, and measures of user engagement with the automatic system. The material is available through a web-accessible database.
INDEX TERMS
visual databases, audio databases, behavioural sciences computing, cameras, interactive systems, Internet, iterative methods, microphones, natural language processing, Web-accessible database, SEMAINE database, annotated multimodal records, emotionally colored conversations, limited agent, audiovisual database, iterative approach, sensitive artificial listener agents, solid SAL, semiautomatic SAL, user interactions, Automatic SAL, reduced nonverbal skills, high-resolution high-framerate cameras, microphones, affective dimensions, FACS annotation, Databases, Humans, Solids, Face, Buildings, Speech, social signal processing., Emotional corpora, affective annotation, affective computing
CITATION
M. Valstar, G. McKeown, M. Pantic, M. Schroder, "The SEMAINE Database: Annotated Multimodal Records of Emotionally Colored Conversations between a Person and a Limited Agent", IEEE Transactions on Affective Computing, vol.3, no. 1, pp. 5-17, Jan.-March 2012, doi:10.1109/T-AFFC.2011.20
REFERENCES
[1] R. Cowie, E. Douglas-Cowie, N. Tsapatsoulis, G. Votsis, S. Kollias, W. Fellenz, and J.G. Taylor, “Emotion Recognition in Human-Computer Interaction,” IEEE Signal Processing Magazine, vol. 18, no. 1, pp. 32-80, Jan. 2001.
[2] R. Cowie, E. Douglas-Cowie, and C. Cox, “Beyond Emotion Archetypes: Databases for Emotion Modelling Using Neural Networks,” Neural Networks, vol. 18, no. 4, pp. 371-88, 2005.
[3] R. Cowie and M. Schröder, “Piecing Together the Emotion Jigsaw,” Machine Learning for Multimodal Interaction, vol. 3361, pp. 305-317, 2005.
[4] R. Cowie, E. Douglas-Cowie, J.-C. Martin, and L. Devillers, “The Essential Role of Human Databases for Learning in and Validation of Affectively Competent Agents,” A Blueprint for Affective Computing: A Sourcebook and Manual, K.R. Scherer, T. Bänziger, and E. Roesch, eds., pp. 151-165, Oxford Univ. Press, 2010.
[5] S. Afzal and P. Robinson, “Natural Affect Data—Collection & Annotation in a Learning Context,” Proc. Third Int'l Conf. Affective Computing and Intelligent Interaction and Workshops, pp. 1-7, 2009.
[6] R. Cowie, E. Douglas-Cowie, I. Sneddon, M. McRorie, J. Hanratty, E. McMahon, and G. McKeown, “Induction Techniques Developed to Illuminate Relationships Between Signs of Emotion and Their Context, Physical and Social,” A Blueprint for Affective Computing: A Sourcebook and Manual, K.R. Scherer, T. Bänziger, and E. Roesch, eds., pp. 295 -307, Oxford Univ. Press, 2010.
[7] D. Heylen, D. Reidsma, and R. Ordelman, “Annotating State of Mind in Meeting Data,” Proc. Workshop Programme Corpora for Research on Emotion and Affect, pp. 84-170, 2006.
[8] P. Ekman and W.V. Friesen, Pictures of Facial Affect. Consulting Psychologists Press, 1976.
[9] M. Kienast and W. Sendlmeier, “Acoustical Analysis of Spectral and Temporal Changes in Emotional Speech,” Proc. ISCA Tutorial and Research Workshop Speech and Emotion, 2000.
[10] T. Kanade, J. Cohn, and Y. Tian, “Comprehensive Database for Facial Expression Analysis,” Proc. IEEE Fourth Int'l Conf. Automatic Face and Gesture Recognition, pp. 46-53, 2000.
[11] T. Bänziger and K.R. Scherer, “Using Actor Portrayals to Systematically Study Multimodal Emotion Expression: The Gemep Corpus,” Proc. Second Int'l Conf. Affective Computing and Intelligent Interaction, pp. 476-487, 2007.
[12] J. Hanratty, “Individual and Situational Differences in Emotional Expression,” PhD dissertation, School of Psychology, Queen's Univ. Belfast, 2010.
[13] E. Douglas-Cowie, R. Cowie, and M. Schröder, “A New Emotion Database: Considerations, Sources and Scope,” Proc. ISCA Tutorial and Research Workshop Speech and Emotion, 2000.
[14] S. Abrilian, L. Devillers, S. Buisine, and J.-C. Martin, “Emotv1: Annotation of Real-Life Emotions for the Specification of Multimodal Affective Interfaces,” Proc. 11th Int'l Conf. Human-Computer Interaction, July 2005.
[15] M. Grimm, K. Kroschel, and S. Narayanan, “The Vera am Mittag German Audio-Visual Emotional Speech Database,” Proc. IEEE Int'l Conf. Multimedia and Expo, pp. 865-868, 2008.
[16] A. Batliner, C. Hacker, S. Steidl, E. Nöth, S. D'Arcy, M. Russell, and M. Wong, “‘You Stupid Tin Box’—Children Interacting with the Aibo Robot: A Cross-Linguistic Emotional Speech Corpus,” Proc. Fourth Int'l Conf. Language Resources and Evaluation, pp. 171-174, 2004.
[17] M. Pantic, M. Valstar, R. Rademaker, and L. Maat, “Web-Based Database for Facial Expression Analysis,” Proc. IEEE Int'l Conf. Multimedia and Expo, p. 5, 2005.
[18] A. Vinciarelli, A. Dielmann, S. Favre, and H. Salamin, “Canal9: A Database of Political Debates for Analysis of Social Interactions,” Proc. Int'l Conf. Affective Computing and Intelligent Interaction, pp. 1-4, 2009.
[19] C. Busso, M. Bulut, C.-C. Lee, A. Kazemzadeh, E. Mower, S. Kim, J.N. Chang, S. Lee, and S.S. Narayanan, “IEMOCAP: Interactive Emotional Dyadic Motion Capture Database,” J. Language Resources and Evaluation, vol. 42, no. 4, pp. 335-359, 2008.
[20] M. Schröder et al., “Building Autonomous Sensitive Artificial Listeners,” IEEE Trans. Affective Computing, under revision.
[21] E. Douglas-Cowie, R. Cowie, C. Cox, N. Amir, and D. Heylen, “The Sensitive Artificial Listener: An Induction Technique for Generating Emotionally Coloured Conversation,” Proc. Workshop Corpora for Research on Emotion and Affect, 2008.
[22] G. Caridakis, K. Karpouzis, M. Wallace, L. Kessous, and N. Amir, “Multimodal User's Affective State Analysis in Naturalistic Interaction,” J. Multimodal User Interfaces, vol. 3, pp. 49-66, 2010.
[23] R. Cowie and R. Cornelius, “Describing the Emotional States that Are Expressed in Speech,” Speech Comm., vol. 40, nos. 1/2, pp. 5-32, 2003.
[24] R. Cowie, E. Douglas-Cowie, S. Savvidou, E. McMahon, M. Sawey, and M. Schroder, “'FEELTRACE': An Instrument for Recording Perceived Emotion in Real Time,” Proc. ISCA Tutorial and Research Workshop Speech and Emotion, 2000.
[25] J. Russell and L. Barrett, “Core Affect, Prototypical Emotional Episodes, and Other Things Called Emotion: Dissecting the Elephant,” J. Personality and Social Psychology, vol. 76, no. 5, pp. 805-819, 1999.
[26] R. Cowie, C. Cox, J.-C. Martin, A. Batliner, D. Heylen, and K. Karpouzis, “Issues in Data Labelling,” Emotion-Oriented Systems: The Humaine Handbook, R. Cowie, C. Pelachaud, and P. Petta, eds., pp. 213-241, Springer-Verlag, 2011.
[27] G. McKeown, “Chatting with a Virtual Agent: The Semaine Project Character Spike [Video File],” http://youtu.be6KZc6e_EuCg, 2011.
[28] P. Valdez and A. Mehrabian, “Effects of Color on Emotions,” J. Experimental Psychology-General, vol. 123, pp. 394-408, 1994.
[29] J. Lichtenauer, J. Shen, M. Valstar, and M. Pantic, “Cost-Effective Solution to Synchronised Audio-Visual Data Capture Using Multiple Sensors,” Proc. IEEE Int'l Conf Advanced Video and Signal Based Surveillance, pp. 324-329, 2010.
[30] J. Fontaine, S.K.R., E. Roesch, and P. Ellsworth, “The World of Emotions Is Not Two-Dimensional,” Psychological Science, vol. 18, no. 2, pp. 1050-1057, 2007.
[31] P. Ekman, “Basic Emotions,” Handbook of Cognition and Emotion, pp. 45-60, John Wiley, 1999.
[32] S. Baron-Cohen, O. Golan, S. Wheelwright, and J.J. Hill, Mind Reading: The Interactive Guide to Emotions. Jessica Kingsley Publishers, 2004.
[33] R.F. Bales, Interaction Process Analysis: A Method for the Study of Groups. Addison Wesley, 1951.
[34] P. Ekman, W.V. Friesen, M. O'Sullivan, A. Chan, I. Diacoyanni-Tarlatzis, K. Heider, R. Krause, W.A. LeCompte, T. Pitcairn, and P.E. Ricci-Bitti, “Universals and Cultural Differences in the Judgments of Facial Expressions of Emotion,” J. Personality and Social Psychology, vol. 53, no. 4, pp. 712-717, 1987.
[35] E. McClave, “Linguistic Functions of Head Movements in the Context of Speech,” J. Pragmatics, vol. 32, no. 7, pp. 855-878, 2000.
[36] R. Cowie, H. Gunes, G. McKeown, L. Vaclavu-Schneider, J. Armstrong, and E. Douglas-Cowie, “The Emotional and Communicative Significance of Head Nods and Shakes in a Naturalistic Database,” Proc. LREC Int'l Workshop Emotion, pp. 42-46, 2010.
[37] P. Ekman and W.V. Friesen, Facial Action Coding System. Consulting Psychologists Press, 1978.
[38] R. Cowie and G. McKeown, “Statistical Analysis of Data from Initial Labelled Database and Recommendations for an Economical Coding Scheme,” http:/www.semaine-project.eu/, 2010.
[39] B. Jiang, M. Valstar, and M. Pantic, “Action Unit Detection Using Sparse Appearance Descriptors in Space-Time Video Volumes,” Proc. IEEE Int'l Conf. Face and Gesture Recognition, 2011.
[40] H. Gunes and M. Pantic, “Dimensional Emotion Prediction from Spontaneous Head Gestures for Interaction with Sensitive Artificial Listeners,” Proc. Int'l Conf. Intelligent Virtual Agents, pp. 371-377, 2010.
[41] H. Gunes and M. Pantic, “Automatic, Dimensional and Continuous Emotion Recognition,” Int'l J. Synthetic Emotions, vol. 1, no. 1, pp. 68-99, 2010.
[42] H.G.M.A. Nicolaou and M. Pantic, “Automatic Segmentation of Spontaneous Data Using Dimensional Labels from Multiple Coders,” Proc. LREC Workshop Multimodal Corpora: Advances in Capturing, Coding and Analyzing Multimodality, pp. 43-48, 2010.
[43] F. Eyben, M. Wollmer, M. Valstar, H. Gunes, B. Schuller, and M. Pantic, “String-Based Audiovisual Fusion of Behavioural Events for the Assessment of Dimensional Affect,” Proc. IEEE Int'l Conf. Face and Gesture Recognition, 2011.
[44] B. Schuller, S. Steidl, A. Batliner, F. Burkhardt, L. Devillers, C. Müller, and S. Narayanan, “The Interspeech 2010 Paralinguistic Challenge,” Proc. 11th Ann. Conf. the Int'l Speech Comm. Assoc., pp. 2794-2797, 2010.
[45] M. Valstar, B. Jiang, M. Mehu, M. Pantic, and K. Scherer, “The First Facial Expression Recognition and Analysis Challenge,” Proc. IEEE Int'l Conf. Automatic Face and Gesture Recognition, 2011.
[46] K. Scherer and H. Ellgring, “Are Facial Expressions of Emotion Produced by Categorical Affect Programs or Dynamically Driven by Appraisal,” Emotion, vol. 7, no. 1, pp. 113-130, 2007.
[47] K. Scherer and H. Ellgring, “Multimodal Expression of Emotion: Affect Programs or Componential Appraisal Patterns,” Emotion, vol. 7, no. 1, pp. 158-171, 2007.
[48] R. Cowie, “Perceiving Emotion: Towards a Realistic Understanding of the Task,” Philosophical Trans. Royal Soc. London B, vol. 364, no. 1535, pp. 3515-3525, 2009.
9 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool