This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Building Autonomous Sensitive Artificial Listeners
April-June 2012 (vol. 3 no. 2)
pp. 165-183
E. de Sevin, LIP6, UPMC, Paris, France
M. Pantic, Dept. of Comput., Imperial Coll. London, London, UK
C. Pelachaud, LTCI, Telecom ParisTech, Paris, France
B. Schuller, Tech. Univ. Munchen, Munchen, Germany
S. Pammi, DFKI GmbH, Saarbrucken, Germany
G. McKeown, Sch. of Psychol., Queen's Univ. Belfast, Belfast, UK
D. Heylen, Human Media Interaction, Univ. Twente, Enschede, Netherlands
M. ter Maat, Human Media Interaction, Univ. Twente, Enschede, Netherlands
F. Eyben, Tech. Univ. Munchen, Munchen, Germany
H. Gunes, Sch. of Electron. Eng. & Comput. Sci. (EECS), Queen Mary Univ. of London, London, UK
E. Bevacqua, Centre Europen de Ralit Virtuelle, ENIB, Plouzan, France
R. Cowie, Sch. of Psychol., Queen's Univ. Belfast, Belfast, UK
M. Schroder, DFKI GmbH, Saarbrucken, Germany
M. Valstar, Dept. of Comput., Imperial Coll. London, London, UK
M. Wollmer, Tech. Univ. Munchen, Munchen, Germany
This paper describes a substantial effort to build a real-time interactive multimodal dialogue system with a focus on emotional and nonverbal interaction capabilities. The work is motivated by the aim to provide technology with competences in perceiving and producing the emotional and nonverbal behaviors required to sustain a conversational dialogue. We present the Sensitive Artificial Listener (SAL) scenario as a setting which seems particularly suited for the study of emotional and nonverbal behavior since it requires only very limited verbal understanding on the part of the machine. This scenario allows us to concentrate on nonverbal capabilities without having to address at the same time the challenges of spoken language understanding, task modeling, etc. We first report on three prototype versions of the SAL scenario in which the behavior of the Sensitive Artificial Listener characters was determined by a human operator. These prototypes served the purpose of verifying the effectiveness of the SAL scenario and allowed us to collect data required for building system components for analyzing and synthesizing the respective behaviors. We then describe the fully autonomous integrated real-time system we created, which combines incremental analysis of user behavior, dialogue management, and synthesis of speaker and listener behavior of a SAL character displayed as a virtual agent. We discuss principles that should underlie the evaluation of SAL-type systems. Since the system is designed for modularity and reuse and since it is publicly available, the SAL system has potential as a joint research tool in the affective computing research community.

[1] R. Cowie, "Describing the Forms of Emotional Colouring that Pervade Everyday Life," The Oxford Handbook of Philosophy of Emotion, P. Goldie ed., pp. 63-94, Oxford Univ. Press, 2010, doi: 10.1093/oxfordhb/9780199235018.003.0004.
[2] V.H. Yngve, "On Getting a Word in Edgewise," Chicago Linguistic Soc. Papers from the Sixth Regional Meeting, vol. 6, pp. 567-577, 1970.
[3] J. Allwood, J. Nivre, and E. Ahlsén, "On the Semantics and Pragmatics of Linguistic Feedback," J. Semantics, vol. 9, no. 1, pp. 1-26, http://jos.oxfordjournals.org/cgi/content/ abstract/9/11, 1992.
[4] H. Sacks, E.A. Schegloff, and G. Jefferson, "A Simplest Systematics for the Organization of Turn-Taking for Conversation," Language, vol. 50, no. 4, pp. 696-735, 1974.
[5] D. Jurafsky, J.A. Martin, and A. Kehler, Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. MIT Press, 2000.
[6] J.F. Allen, D.K. Byron, M. Dzikovska, G. Ferguson, L. Galescu, and A. Stent, "Toward Conversational Human-Computer Interaction," AI Magazine, vol. 22, no. 4, pp. 27-37, 2001.
[7] G.J. Kruijff, P. Lison, T. Benjamin, H. Jacobsson, and N. Hawes, "Incremental, Multi-Level Processing for Comprehending Situated Dialogue in Human-Robot Interaction," Proc. Symp. Language and Robots, pp. 55-64, 2007.
[8] A. Mehrabian and S.R. Ferris, "Inference of Attitudes from Nonverbal Communication in Two Channels," J. Consulting Psychology, vol. 31, no. 3, pp. 248-252, http://dx.doi.org/10.1037h0024648, 1967.
[9] R.L. Birdwhistell, Kinesics and Context. Univ. of Pennsylvania Press, 1970.
[10] G. McKeown, M. Valstar, R. Cowie, M. Pantic, and M. Schröder, "The SEMAINE Database: Annotated Multimodal Records of Emotionally Coloured Conversations between a Person and a Limited Agent," IEEE Trans. Affective Computing, vol. 3, no. 1, pp. 5-17, Jan.-Mar. 2012.
[11] M. Schröder, "The SEMAINE API: Towards a Standards-Based Framework for Building Emotion-Oriented Systems," Advances in Human-Computer Interaction, vol. 2010, no. 319406, p. 21, http://dx.doi.org/10.1155/2010319406, 2010.
[12] M. Wöllmer, B. Schuller, F. Eyben, and G. Rigoll, "Combining Long Short-Term Memory and Dynamic Bayesian Networks for Incremental Emotion-Sensitive Artificial Listening," IEEE J. Selected Topics in Signal Processing, vol. 4, no. 5, pp. 867-881, Oct. 2010.
[13] F. Eybenya, M. Wöllmer, M.F. Valstar, H. Gunes, B. Schuller, and M. Pantic, "String-Based Audiovisual Fusion of Behavioural Events for the Assessment of Dimensional Affect," Proc. IEEE Int'l Conf. Automatic Face and Gesture Recognition, pp. 322-329, 2011.
[14] H. Gunes and M. Pantic, "Dimensional Emotion Prediction from Spontaneous Head Gestures for Interaction with Sensitive Artificial Listeners," Proc. 10th Int'l Conf. Intelligent Virtual Agents, pp. 371-377, 2010.
[15] B. Jiang, M.F. Valstar, and M. Pantic, "Action Unit Detection Using Sparse Appearance Descriptors in Space-Time Video Volumes," Proc. IEEE Int'l Conf. Face and Gesture Recognition, pp. 314-321, 2011.
[16] M. ter Maat, K.P. Truong, and D. Heylen, "How Turn-Taking Strategies Influence Users' Impressions of an Agent," Proc. Int'l Conf. Intelligent Virtual Agents, pp. 441-453, http://dx.doi.org/10.1007978-3-642-15892-6_48 , 2010.
[17] E. Bevacqua, E. deSevin, C. Pelachaud, M. McRorie, and I. Sneddon, "Building Credible Agents: Behaviour Influenced By Personality and Emotional Traits," Proc. Int'l Conf. Kansei Eng. and Emotion Research, 2010.
[18] D. Goddeau, E. Brill, J.R. Glass, C. Pao, M. Phillips, J. Polifroni, S. Seneff, and V.W. Zue, "Galaxy: A Human-Language Interface to On-Line Travel Information," Proc. Third Int'l Conf. Spoken Language Processing, 1994.
[19] V.W. Zue and J.R. Glass, "Conversational Interfaces: Advances and Challenges," Proc. IEEE, vol. 88, no. 8, pp. 1166-1180, Aug. 2000.
[20] J. Allen, G. Ferguson, and A. Stent, "An Architecture for More Realistic Conversational Systems," Proc. Sixth Int'l Conf. Intelligent User Interfaces, pp. 1-8, http://portal.acm.orgcitation.cfm? id=359822 , 2001.
[21] O. Lemon and O. Pietquin, "Machine Learning for Spoken Dialogue Systems," Proc. European Conf. Speech Comm. and Technologies, 2007.
[22] A. Raux, "Flexible Turn-Taking for Spoken Dialog Systems," PhD dissertation, Carnegie Mellon Univ., Pittsburgh, Penn., 2008.
[23] N. Ward and W. Tsukahara, "Prosodic Features Which Cue Back-Channel Responses in English and Japanese," J. Pragmatics, vol. 32, no. 8, pp. 1177-1207, http://dx.doi.org/10.1016 S0378-2166(99)00109-5 , July 2000.
[24] M. Schröder, D. Heylen, and I. Poggi, "Perception of Non-Verbal Emotional Listener Feedback," Proc. Speech Prosody '06, 2006.
[25] D. Pardo, B.L. Mencia, A.H. Trapote, and L. Hernandez, "Non-Verbal Communication Strategies to Improve Robustness in Dialogue Systems: A Comparative Study," J. Multimodal User Interfaces, vol. 3, no. 4, pp. 285-297, http://www.springerlink. com/contentk58267064t133350 /, 2010.
[26] W. Wahlster, "Smartkom: Symmetric Multimodality in an Adaptive and Reusable Dialogue Shell," Proc. Human Computer Interaction Status Conf., vol. 3, pp. 47-62, 2003.
[27] J. Cassell, T. Bickmore, L. Campbell, H. Vilhjálmsson, and H. Yan, "Human Conversation As a System Framework: Designing Embodied Conversational Agents," Embodied Conversational Agents, pp. 29-63, MIT Press, http://portal.acm.orgcitation.cfm? id=371555 , 2000.
[28] S. Hyniewska, R. Niewiadomski, M. Mancini, and C. Pelachaud, "Expression of Affects in Embodied Conversational Agents," A Blueprint for Affective Computing, K.R. Scherer, T. Bänziger, and E.B. Roesch eds., Oxford Univ. Press, 2010.
[29] M. Schröder, "Expressive Speech Synthesis: Past, Present, and Possible Futures," Affective Information Processing, J. Tao and T. Tan eds., pp. 111-126, Springer, http://dx.doi.org/10.1007978-1-84800-306-4_7 , 2009.
[30] K. van Deemter, B. Krenn, P. Piwek, M. Klesen, M. Schröder, and S. Baumann, "Fully Generated Scripted Dialogue for Embodied Agents," Artificial Intelligence, vol. 172, no. 10, pp. 1219-1244, June 2008.
[31] B. Kempe, N. Pfleger, and M. Löckelt, "Generating Verbal and Nonverbal Utterances for Virtual Characters," Proc. Third Int'l Conf. Virtual Storytelling, pp. 73-76, http://dx.doi.org/10.100711590361_8, 2005.
[32] N. Krämer, L. Hoffmann, and S. Kopp, "Know Your Users! Empirical Results for Tailoring an Agent's Nonverbal Behavior to Different User Groups," Proc. 10th Int'l Conf. Intelligent Virtual Agents, pp. 468-474, 2010.
[33] Z. Zeng, M. Pantic, G.I. Roisman, and T.S. Huang, "A Survey of Affect Recognition Methods: Audio, Visual and Spontaneous Expressions," Proc. Ninth Int'l Conf. Multimodal Interfaces, pp. 126-133, http://portal.acm.orgcitation.cfm?id=1322192.1322216 , 2007.
[34] A. Batliner et al., "Whodunnit—Searching for the Most Important Feature Types Signalling Emotion-Related User States in Speech," Computer Speech & Language, vol. 25, no. 1, pp. 4-28, 2011.
[35] F. Burkhardt, M. Ballegooyxyd, R. Englert, and R. Huber, "An Emotion-Aware Voice Portal," Proc. Int'l Conf. Electronic Speech Signal Processing, pp. 123-131, 2005.
[36] C. Nass and Y. Moon, "Machines and Mindlessness: Social Responses to Computers," J. Social Issues, vol. 56, no. 1, pp. 81-103, http://dx.doi.org/10.11110022-4537.00153 , 2000.
[37] F. Biocca, J. Burgoon, C. Harms, and M. Stoner, "Criteria and Scope Conditions for a Theory and Measure of Social Presence," Proc. Fourth Ann. Int'l Presence Workshop, 2001.
[38] A. von der Pütten, N.C. Krämer, and J. Gratch, "Who's There? Can a Virtual Agent Really Elicit Social Presence?" Proc. 12th Ann. Int'l Workshop Presence, 2009.
[39] J. Cassell, "Nudge Nudge Wink Wink: Elements of Face-To-Face Conversation for Embodied Conversational Agents," Embodied Conversational Agents, pp. 1-27, MIT Press, http://portal.acm.orgcitation.cfm?id=371554 , 2000.
[40] V. Vinayagamoorthy, M. Gillies, A. Steed, E. Tanguy, X. Pan, C. Loscos, and M. Slater, "Building Expression Into Virtual Characters," Proc. Eurographics Conf. State of the Art Report, 2006.
[41] E. André and C. Pelachaud, "Interacting with Embodied Conversational Agents," Speech Technology, F. Chen and K. Jokinen, eds., Springer, http://dx.doi.org/10.1007978-0-387-73819-2_8 , 2010.
[42] D. Heylen, E. Bevacqua, C. Pelachaud, I. Poggi, J. Gratch, and M. Schröder, "Generating Listening Behaviour," Emotion-Oriented Systems—The Humaine Handbook, P. Petta, C. Pelachaud, and R. Cowie eds., pp. 321-348, Springer, 2010.
[43] D.K. Heylen, "Head Gestures, Gaze and the Principles of Conversational Structure," Int'l J. Humanoid Robotics, vol. 3, no. 3, pp. 241-267, 2006.
[44] J. Gratch, N. Wang, J. Gerten, E. Fast, and R. Duffy, "Creating Rapport with Virtual Agents," Proc. Seventh Int'l Conf. Intelligent Virtual Agents, pp. 125-138, http://dx.doi.org/10.1007978-3-540-74997-4_12 , 2007.
[45] J.C. Acosta, "Using Emotion to Gain Rapport in a Spoken Dialog System," PhD thesis, Univ. of Texas at El Paso, 2009.
[46] R. Porzel and M. Baudis, "The Tao of CHI: Towards Effective Human-Computer Interaction," Proc. Human Language Technology Conf. North Am. Chapter of the Assoc. for Computational Linguistics, 2004.
[47] D. Marr, Vision: A Computational Investigation into the Human Representation and Processing of Visual Information. Freeman, 1982.
[48] J.A. Russell, "A Circumplex Model of Affect," J. Personality and Social Psychology, vol. 39, pp. 1161-1178, 1980.
[49] J. Weizenbaum, "ELIZA—A Computer Program for the Study of Natural Language Communication between Man and Machine," Comm. ACM, vol. 9, no. 1, pp. 36-45, 1966, doi:10.1145/365153.365168.
[50] E. Douglas-Cowie, R. Cowie, C. Cox, N. Amir, and D. Heylen, "The Sensitive Artificial Listener: An Induction Technique for Generating Emotionally Coloured Conversation," Proc. Workshop Corpora for Research on Emotion and Affect, pp. 1-4, 2008.
[51] D. Heylen, A. Nijholt, and M. Poel, "Generating Nonverbal Signals for a Sensitive Artificial Listener," Proc. Int'l Conf. Verbal and Nonverbal Comm. Behaviors, pp. 264-274, http://dx.doi.org/10.1007978-3-540-76442-7_23 , 2007.
[52] R. Cowie and G. McKeown, "Statistical Analysis of Data from Initial Labelled Database and Recommendations for an Economical Coding Scheme," SEMAINE Report D6b, http:// semaine-project.euD6b_labelled_data.pdf , 2010.
[53] J.R. Fontaine, K.R. Scherer, E.B. Roesch, and P.C. Ellsworth, "The World of Emotions Is Not Two-Dimensional," Psychological Science, vol. 18, no. 12, pp. 1050-1057, 2007, doi:10.1111/j.1467-9280.2007.02024.x.
[54] K.R. Scherer, "What Are Emotions? and How Can They Be Measured?" Social Science Information, vol. 44, no. 4, pp. 695-729, 2005, doi:10.1177/0539018405058216.
[55] R. Reisenzein, "Pleasure-Arousal Theory and the Intensity of Emotions," J. Personality and Social Psychology, vol. 67, no. 3, pp. 525-539, Sept. 1994.
[56] R. Cowie, E. Douglas-Cowie, S. Savvidou, E. McMahon, M. Sawey, and M. Schröder, "'FEELTRACE': An Instrument for Recording Perceived Emotion in Real Time," Proc. ISCA Workshop Speech and Emotion, pp. 19-24, 2000.
[57] R. Cowie, C. Cox, J.C. Martin, A. Batliner, D. Heylen, and K. Karpouzis, "Issues in Data Labelling," Emotion-Oriented Systems: The Humaine Handbook, R. Cowie C. Pelachaud, and P. Petta, eds., pp. 213-241. Springer-Verlag, Oct. 2010.
[58] R. Cowie and R.R. Cornelius, "Describing the Emotional States that Are Expressed in Speech," Speech Comm., vol. 40, nos. 1/2, pp. 5-32, 2003.
[59] L. Devillers, R. Cowie, J.C. Martin, E. Douglas-Cowie, S. Abrilian, and M. McRorie, "Real Life Emotions in French and English TV Video Clips: An Integrated Annotation Protocol Combining Continuous and Discrete Approaches," Proc. Fifth Int'l Conf. Language Resources and Evaluation, 2006.
[60] S. Baron-Cohen, O. Golan, S. Wheelwright, and J. Hill, Mind Reading: The Interactive Guide to Emotions. Jessica Kingsley Publishers, 2004.
[61] R. Cowie, H. Gunes, G. McKeown, L. Vaclavu-Schneider, J. Armstrong, and E. Douglas-Cowie, "The Emotional and Communicative Significance of Head Nods and Shakes in a Naturalistic Database," Proc. Language Resources and Evaluation Workshop Emotion Corpora, pp. 42-46. 2010.
[62] M. Johnston, P. Baggia, D.C. Burnett, J. Carter, D.A. Dahl, G. McCobb, and D. Raggett, "EMMA: Extensible MultiModal Annotation Markup Language," World Wide Web Consortium, W3C Recommendation, http://www.w3.org/TRemma/, Feb. 2009.
[63] S. Kopp, B. Krenn, S. Marsella, A. Marshall, C. Pelachaud, H. Pirker, K. Thórisson, and H. Vilhjálmsson, "Towards a Common Framework for Multimodal Generation: The Behavior Markup Language," Proc. Int'l Conf. Intelligent Virtual Agents, pp. 205-217, http://dx.doi.org/10.100711821830_17, 2006.
[64] M. Wöllmer, F. Eyben, B. Schuller, E. Douglas-Cowie, and R. Cowie, "Data-Driven Clustering in Emotional Space for Affect Recognition Using Discriminatively Trained LSTM Networks," Proc. 10th Conf. the Int'l Speech Comm. Assoc., pp. 1595-1598. 2009.
[65] F. Eyben, H. Gunes, M. Pantic, M. Schröder, B. Schuller, M.F. Valstar, and M. Wöllmer, "User-Profiled Human Behaviour Interpreter," SEMAINE Report D3c, http://semaine.sourceforge. net/SEMAINE-3.0 D3c%20Human%20behaviour%20interpreter. pdf , 2010.
[66] M. Wöllmer, F. Eyben, B. Schuller, and G. Rigoll, "Recognition of Spontaneous Conversational Speech Using Long Short-Term Memory Phoneme Predictions," Proc. 11th Ann. Conf. the Int'l Speech Comm. Assoc., pp. 1946-1949. 2010.
[67] M. Wöllmer, F. Eyben, J. Keshet, A. Graves, B. Schuller, and G. Rigoll, "Robust Discriminative Keyword Spotting for Emotionally Colored Spontaneous Speech Using Bidirectional LSTM Networks," Proc. IEEE Int'l Conf. Acoustics, Speech and Signal Processing, 2009.
[68] M. Wöllmer, F. Eyben, A. Graves, B. Schuller, and G. Rigoll, "Bidirectional LSTM Networks for Context-Sensitive Keyword Detection in a Cognitive Virtual Agent Framework," Cognitive Computation, vol. 2, no. 3, pp. 180-190, 2010.
[69] A. Stupakov, E. Hanusa, J. Bilmes, and D. Fox, "COSINE—A Corpus of Multi-Party Conversational Speech in Noisy Environments," Proc. IEEE Int'l Conf. Acoustics, Speech and Signal Processing, 2009.
[70] P. Viola and M. Jones, "Robust Real-Time Object Detection," Int'l J. Computer Vision, vol. 57, no. 2, pp. 137-154, 2002.
[71] T. Ojala, M. Pietikainen, and T. Maenpaa, "Multiresolution Grey-Scale and Rotation Invariant Texture Classification with Local Binary Patterns," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 7, pp. 971-987, July 2002.
[72] C.-C. Chang and C.-J. Lin, LibSVM: A Library for Support Vector Machines, software available at http://www.csie.ntu.edu.tw/~cjlinlibsvm, 2001.
[73] B. Schuller, R. Müller, F. Eyben, J. Gast, B. Hörnler, M. Wöllmer, G. Rigoll, A. Höthker, and H. Konosu, "Being Bored? Recognising Natural Interest by Extensive Audiovisual Integration for Real-Life Application," Image and Vision Computing J., vol. 27, no. 12, pp. 1760-1774, 2009.
[74] H. Gunes and M. Pantic, "Automatic, Dimensional and Continuous Emotion Recognition," Int'l J. Synthetic Emotions, vol. 1, no. 1, pp. 68-99, 2010.
[75] H. Gunes and M. Pantic, "Automatic Measurement of Affect in Dimensional and Continuous Spaces: Why, What, and How?," Proc. Seventh Int'l Conf. Methods and Techniques in Behavioral Research, pp. 122-126, 2010.
[76] M. Nicolaou, H. Gunes, and M. Pantic, "Audio-Visual Classification and Fusion of Spontaneous Affective Data in Likelihood Space," Proc. IEEE Int'l Conf. Pattern Recognition, pp. 3695-3699, 2010.
[77] M. ter Maat and D. Heylen, "Selecting Appropriate Agent Responses Based on Non-Content Features," Proc. Third Int'l Workshop Affective Interaction in Natural Environments, 2010.
[78] R.M. Maatman, J. Gratch, and S. Marsella, "Natural Behavior of a Listening Agent," Proc. Fifth Int'l Conf. Interactive Virtual Agents, 2005.
[79] N. Ward and W. Tsukahara, "Prosodic Features Which Cue Back-Channel Responses in English and Japanese," J. Pragmatics, vol. 23, pp. 1177-1207, 2000.
[80] I. Poggi, Mind, Hands, Face and Body. A Goal and Belief View of Multimodal Communication. Weidler, 2007.
[81] S.H.E. Sevin and C. Pelachaud, "Influence of Personality Traits on Backchannel Selection," Proc. 10th Int'l Conf. Intelligent Virtual Agents, 2010.
[82] M. McRorie, I. Sneddon, E. Sevin, E. Bevacqua, and C. Pelachaud, "A Model of Personality and Emotional Traits," Proc. Ninth Int'l Conf. Intelligent Virtual Agents, 2009.
[83] M. Mancini and C. Pelachaud, "Distinctiveness in Multimodal Behaviors," Proc. Conf. Autonomous Agents and MultiAgent System, 2008.
[84] M. Schröder, S. Pammi, and O. Türk, "Multilingual MARY TTS Participation in the Blizzard Challenge 2009," Proc. Blizzard Challenge '09 Workshop, 2009.
[85] S. Pammi, M. Charfuelan, and M. Schröder, "Multilingual Voice Creation Toolkit for the MARY TTS Platform," Proc. Language Resources and Evaluation Workshop, http://www.dfki.de/ltpublication_show.php?id=4878 , 2010.
[86] S. Pammi, M. Schröder, M. Charfuelan, O. Türk, and I. Steiner, "Synthesis of Listener Vocalisations With Imposed Intonation Contours," Proc. Seventh ISCA Tutorial and Research Workshop Speech Synthesis, http://www.dfki.de/ltpublication_show. php?id=4886 , 2010.
[87] R. Niewiadomski, E. Bevacqua, M. Mancini, and C. Pelachaud, "Greta: An Interactive Expressive Eca System," Proc. Eighth Int'l Conf. Autonomous Agents and Multiagent Systems, 2009.
[88] S. Westerman, P. Gardner, and E. Sutherland, "Usability Testing Emotion-Oriented Computing Systems: Psychometric Assessment," HUMAINE Deliverable D9f, http://emotion-research.net/projects/humaine/ deliverablesD9f%20Psychometrics%20-%20Final%20-%20with%20updated%20references.pdf , 2006.
[89] R. Cowie, "Perceiving Emotion: Towards a Realistic Understanding of the Task," Philosophical Trans. B, vol. 364, no. 1535, pp. 3515-3525, http://dx.doi.org/10.1098rstb.2009.0139, Dec. 2009.
[90] R. Cowie and E. Douglas-Cowie, "Prosodic and Related Features that Signify Emotional Colouring in Conversational Speech," The Role of Prosody in Affective Speech Studies in Language and Communication, S. Hancil, ed., vol. 97, pp. 213-240, Peter Lang, 2009.
[91] P.M. Niedenthal, L.W. Barsalou, P. Winkielman, S. Krauth-Gruber, and F. Ric, "Embodiment in Attitudes, Social Perception, and Emotion," Personality and Social Psychology Rev., vol. 9, no. 3, pp. 184-211, http://psr.sagepub.com/cgi/content/abstract/ 9/3184, 2005.
[92] F. Eyben, A. Batliner, B. Schuller, D. Seppi, and S. Steidl, "Cross-Corpus Classification of Realistic Emotions—Some Pilot Experiments," Proc. LREC Workshop Emotion Corpora, pp. 77-82, 2010.
[93] M. Brendel, R. Zaccarelli, B. Schuller, and L. Devillers, "Towards Measuring Similarity between Emotional Corpora," Proc. LREC Workshop Emotion Corpora, pp. 58-64, 2010.
[94] J. Bachorowski, "Vocal Expression and Perception of Emotion," Current Directions in Psychological Science, vol. 8, no. 2, pp. 53-57, 1999.
[95] ISO, "Ergonomic Requirements for Office Work With Visual Display Terminals (VDTs)—Part 11: Guidance on Usability," Int'l Standards Organisation, ISO Standard ISO 9241-11, 1998.
[96] M.A. Walker, D.J. Litman, C.A. Kamm, and A. Abella, "PARADISE: A Framework for Evaluating Spoken Dialogue Agents," Proc. Eighth Conf. European Chapter of the Assoc. for Computational Linguistics, pp. 271-280, http://portal.acm.orgcitation. cfm?id=979652&dl= , 1997.
[97] M. Edwardson, "Measuring Consumer Emotions in Service Encounters: An Exploratory Analysis," Australasian J. Market Research, vol. 6, no. 2, pp. 34-48, 1998.
[98] M. Csikszentmihalyi and I.S. Csikszentmihalyi, Beyond Boredom and Anxiety. Jossey-Bass, 1975.
[99] M.V. Sanchez-Vives and M. Slater, "From Presence to Consciousness through Virtual Reality," Nature Rev. Neuroscience, vol. 6, no. 4, pp. 332-339, http://dx.doi.org/10.1038nrn1651, 2005.
[100] K. Isbister, K. Höök, M. Sharp, and J. Laaksolahti, "The Sensual Evaluation Instrument: Developing an Affective Evaluation Tool," Proc. SIGCHI Conf. Human Factors in Computing Systems, pp. 1163-1172. http://portal.acm.orgcitation.cfm?id=1124772.1124946 , 2006.
[101] R. Cowie and G. McKeown, "Statistical Analysis of Data from Initial Labelled Database and Recommendations for an Economical Coding Scheme," SEMAINE Project, http://semaine-project. euD6b_labelled_data.pdf , 2010.
[102] G. Breaden-Madden, "Emotionally Coloured Discourse: Nonverbal Behaviours as Indicators of Communication Breakdown," Proc. British Psychological Soc. Northern Ireland Branch Ann. Conf., 2011.
[103] P. Baxter, T. Belpaeme, L. Cañamero, P. Cosi, Y. Demiri, and V. Enescu, "Long-Term Human-Robot Interaction with Young Users," Proc. Workshop Robots with Children at Human-Robot Interaction, 2011.

Index Terms:
interactive systems,behavioural sciences computing,emotion recognition,emotion recognition,autonomous sensitive artificial listeners,real-time interactive multimodal dialogue system,nonverbal interaction capabilities,emotional capabilities,spoken language understanding,task modeling,autonomous integrated real-time system,user behavior,dialogue management,listener behavior,speaker behavior,SAL character,Humans,Computers,Speech,Prototypes,Speech recognition,Real-time systems,Emotion recognition,turn-taking.,Embodied conversational agents,Rapport agents,emotion recognition,emotion synthesis,real-time dialogue,listener behavior
Citation:
E. de Sevin, M. Pantic, C. Pelachaud, B. Schuller, S. Pammi, G. McKeown, D. Heylen, M. ter Maat, F. Eyben, H. Gunes, E. Bevacqua, R. Cowie, M. Schroder, M. Valstar, M. Wollmer, "Building Autonomous Sensitive Artificial Listeners," IEEE Transactions on Affective Computing, vol. 3, no. 2, pp. 165-183, April-June 2012, doi:10.1109/T-AFFC.2011.34
Usage of this product signifies your acceptance of the Terms of Use.