This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Classification of Complex Information: Inference of Co-Occurring Affective States from Their Expressions in Speech
July 2010 (vol. 32 no. 7)
pp. 1284-1297
Tal Sobol-Shikler, Ben-Gurion University of the Negev, Beer-Sheva
Peter Robinson, University of Cambridge, Cambridge
We present a classification algorithm for inferring affective states (emotions, mental states, attitudes, and the like) from their nonverbal expressions in speech. It is based on the observations that affective states can occur simultaneously and different sets of vocal features, such as intonation and speech rate, distinguish between nonverbal expressions of different affective states. The input to the inference system was a large set of vocal features and metrics that were extracted from each utterance. The classification algorithm conducted independent pairwise comparisons between nine affective-state groups. The classifier used various subsets of metrics of the vocal features and various classification algorithms for different pairs of affective-state groups. Average classification accuracy of the 36 pairwise machines was 75 percent, using 10-fold cross validation. The comparison results were consolidated into a single ranked list of the nine affective-state groups. This list was the output of the system and represented the inferred combination of co-occurring affective states for the analyzed utterance. The inference accuracy of the combined machine was 83 percent. The system automatically characterized over 500 affective state concepts from the Mind Reading database. The inference of co-occurring affective states was validated by comparing the inferred combinations to the lexical definitions of the labels of the analyzed sentences. The distinguishing capabilities of the system were comparable to human performance.

[1] D. Kahneman and A. Tversky, "Prospect Theory: An Analysis of Decision under Risk," Econometrica, vol. 47, pp. 263-291, 1979.
[2] A. Bechara, H. Damasio, D. Tranel, and A.R. Damasio, "Deciding Advantageously before Knowing the Advantageous Strategy," Science, vol. 275, pp. 1293-1295, 1997.
[3] R.W. Picard, Affective Computing. MIT Press, 1997.
[4] C. Nass and S. Brave, Wired for Speech: How Voice Activates and Advances the Human-Computer Relationship. MIT Press, 2005.
[5] D. Premack and G. Woodruff, "Does the Chimpanzee Have a 'Theory of Mind'?" Behaviour and Brain Sciences, vol. 4, pp. 515-526, 1978.
[6] S. Baron-Cohen, A. Leslie, and U. Frith, "Does the Autistic Child Have a Theory of Mind?" Cognition, vol. 21, pp. 37-46, 1985.
[7] B. Reeves and C. Nass, The Media Equation. Cambridge Univ. Press, 1996.
[8] R. Cowie, E. Douglas-Cowie, N. Tsapatsoulis, G. Votsis, S. Kollias, W. Fellenz, and J.G. Taylor, "Emotion Recognition in Human-Computer Interaction," IEEE Signal Processing Magazine, vol. 18, no. 1, pp. 32-80, Jan. 2001.
[9] R. Cornelius, "Theoretical Approach to Emotion," Proc. ISCA Workshop Speech and Emotion, 2000.
[10] A. Whiten, Natural Theories of Mind. Basil Blackwell, 1991.
[11] S. Baron-Cohen, "Evolution of a Theory of Mind," The Descent of Mind: Psychological Perspectives on Hominid Evolution, M. Corballis and S. Lea, eds., Oxford Univ. Press, 1999.
[12] W. James, "What Is an Emotion?" Mind, vol. 19, pp. 188-205, 1884.
[13] K.R. Scherer, "Studying the Emotion-Antecedent Appraisal Process: An Expert System Approach," Cognition and Emotion, vol. 7, pp. 325-355, 1993.
[14] R. Zajonc, "Feeling and Thinking: Preferences Need No Inferences," Am. Psychologist, vol. 35, pp. 151-175, 1980.
[15] M.V. den Noort, M.P.C. Bosch, and K. Hugdahl, "Understanding the Unconscious Brain: Can Humans Process Emotional Information in a Non-Linear Way?" Proc. Int'l Conf. Cognitive Systems, Dec. 2005.
[16] K.R. Scherer, "How Emotion Is Expressed in Speech and Singing," Proc. 13th Int'l Congress of Phonetic Sciences, pp. 90-96. 1995.
[17] J.D. Haynes and G. Rees, "Decoding Mental States from Brain Activity in Humans," Nature Rev. Neuroscience, vol. 7, pp. 523-534, 2006.
[18] M. Slors, "Personal Identity, Memory, and Circularity: An Alternative for q-Memory," The J. Philosophy, vol. 98, no. 4, pp. 186-214, 2001.
[19] K. Höök, "User-Centred Design and Evaluation of Affective Interfaces," From Brows to Trust: Evaluating Embodied Conversational Agents, Z. Ruttkay and C. Pelachaud, eds., vol. 7, Kluwer, 2004.
[20] L. Devillers, L. Vidrascu, and L. Lamel, "Challenges in Real-Life Emotion Annotation and Machine Learning Based Detection," Neural Networks, vol. 18, pp. 407-422, 2005.
[21] A. Batliner, S. Steidl, B. Schuller, D. Seppi, T. Vogt, L. Devillers, L. Vidrascu, N. Amir, L. Kessous, and V. Aharonson, "The Impact of F0 Extraction Errors on the Classification of Prominence and Emotion," Proc. 16th Int'l Congress of PhoneticSciences, pp. 2201-2204, 2007.
[22] V. Petrushin, "Emotion in Speech: Recognition and Application to Call Centers," Proc. Int'l Conf. Artificial Neural Networks in Eng., citeseer.ist.psu.edupetrushin99emotion.html , 1999.
[23] P.Y. Oudeyer, "The Production and Recognition of Emotions in Speech: Features and Algorithms," Int'l J. Human Computer Interaction, vol. 59, nos. 1/2, pp. 157-183, 2003.
[24] R. Fernandez and R.W. Picard, "Classical and Novel Discriminant Features for Affect Recognition from Speech," Proc. Interspeech '05 —Eurospeech Ninth European Conf. Speech Comm. and Technology, 2005.
[25] L. Vidrascu and L. Devillers, "Five Emotion Classes Detection in Real-World Call Center Data: The Use of Various Types of Paralinguistic Features," Proc. Int'l Workshop Paralinguistic Speech —Between Models and Data, 2007.
[26] Z. Xiao, E. Dellandrea, W. Dou, and L. Chen, "Automatic Hierarchical Classification of Emotional Speech," Proc. Int'l Symp. Multimedia Workshops, pp. 291-296. 2007.
[27] P. Ekman, "Basic Emotion," Handbook of Cognition and Emotion, M. Power and T. Dalgleish, eds., Wiley, 1999.
[28] R. Fernandez and R.W. Picard, "Modeling Drivers' Speech under Stress," Speech Comm., vol. 40, pp. 145-159, 2003.
[29] "Nemesysco, Ltd.—Voice Analysis Technologies," http:/www.nemesysco.com/, Sept. 2006.
[30] C.A. Moore, J.F. Cohn, and G.S. Katz, "Quantitative Description and Differentiation of Fundamental Frequency Contours," Computer Speech and Language, vol. 8, no. 4, pp. 385-404, 1994.
[31] M. Schröder, "Speech and Emotion Research: An Overview of Research Frameworks and a Dimensional Approach to Emotional Speech Synthesis," technical report, The Inst. of Phonetics, Saarland Univ., 2004.
[32] K.R. Scherer, "On the Nature and Function of Emotion: A Component Process Approach," Approaches to Emotion, K.R. Scherer and P. Ekman, eds., pp. 293-317, Erlbaum, 1984.
[33] C.M. Whissell, "The Dictionary of Affect in Language," Emotion: Theory, Research, and Experience, R. Plutchik and H. Kellerman, eds., pp. 113-131, Academic Press, 1989.
[34] J. Kim, "Bimodal Emotion Recognition Using Speech and Physiological Changes," Robust Speech Recognition and Understanding, M. Grimm and K. Kroschel, eds., I-Tech Education and Publishing, 2007.
[35] M. Grimm and K. Kroschel, "Emotion Estimation in Speech Using a 3D Emotion Space Concept," Robust Speech Recognition and Understanding, M. Grimm and K. Kroschel, eds., I-Tech Education and Publishing, 2007.
[36] M.Y.M. Hoque and M. Louwerse, "Robust Recognition of Emotion from Speech," Proc. Sixth Int'l Conf. Intelligent Virtual Agents, 2006.
[37] "Humaine Deliverable d5f," HUMAINE Network of Excellence, EU's Sixth Framework Project, http://emotion-research.net/ projects/humaine deliverables, 2006.
[38] T. Sobol-Shikler and P. Robinson, "Visualizing Dynamic Features of Expressions in Speech," Proc. Int'l Conf. Spoken Language Processing, 2004.
[39] F. Dellaert, T. Polzin, and A. Waibel, "Recognizing Emotions in Speech," Proc. Int'l Conf. Spoken Language Processing, 1996.
[40] F. Burkhardt and M. Schröder, "Emotion Markup Language: Requirements with Priorities," W3C Incubator Group, http://www.w3.org/2005/Incubator/emotion XGR-requirements/, May 2008.
[41] E. Rosch, C.B. Mevis, W. Gray, and D. Johnston, "Basic Objects in Natural Categories," Cognitive Psychology, vol. 8, pp. 382-439, 1976.
[42] R.I. Phelps and P.B. Musgrove, "A Prototypical Approach to Machine Learning," Technical Report TR/02/85, 1985.
[43] S. Baron-Cohen, O. Golan, S. Wheelwright, and J.J. Hill, "Mindreading: The Interactive Guide to Emotions," Jessica Kingsley Limited, http:/www.jkp.com, 2004.
[44] O. Golan, S. Baron-Cohen, and J. Hill, "The Cambridge Mindreading (CAM) Face-Voice Battery: Testing Complex Emotion Recognition in Adults with and without Asperger Syndrome," J. Autism and Developmental Disorders, vol. 23, pp. 7160-7168, 2006.
[45] R. el Kaliouby and P. Robinson, "Real-Time Inference of Complex Mental States from Facial Expressions and Head Gestures," Real-Time Vision for HCI, pp. 181-200, Spring-Verlag, 2005.
[46] T. Sobol-Shikler and P. Robinson, "Recognizing Expressions in Speech for Human Computer Interaction," Designing a More Inclusive World, S. Keates, J. Clarkson, P. Langdon, and P. Robinson, eds., Springer-Verlag 2004.
[47] E. Douglas-Cowie, N. Campbell, R. Cowie, and P. Roach, "Emotional Speech: Towards a New Generation of Databases," Speech Comm., vol. 40, pp. 33-60, 2003.
[48] S. Baron-Cohen, J.J. Hill, O. Golan, and S. Wheelwright, "Mindreading Made Easy," Cambridge Medicine, vol. 17, pp. 28-29, 2002.
[49] T. Sobol-Shikler, "Multi-Modal Analysis of Human Computer Interaction Using Automatic Inference of Aural Expressions in Speech," Proc. IEEE Int'l Conf. Systems, Man, and Cybernetics, 2008.
[50] Z. Xiao, E. Dellandrea, W. Dou, and L. Chen, "Two-Stage Classification of Emotional Speech," Proc. Int'l Conf. Digital Telecomm., 2006.
[51] E. Allwein, R. Schapire, and Y. Singer, "Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers," Proc. 17th Int'l Conf. Machine Learning, 2000.
[52] C.W. Hsu and C.J. Lin, "A Comparison of Methods for Multiclass Support Vector Machines," IEEE Trans. Neural Networks, vol. 13, no. 2, pp. 415-425, Mar. 2002.
[53] C.M.J.A.N. Marquis de Condorcet, "Essay on the Application of Analysis to the Probability of Majority Decisions," 1786.
[54] "Condorcet Method," Wikipedia, The Free Encyclopedia, http://en.wikipedia.orgwiki/, Aug. 2008.
[55] J. Malkevitch, "The Process of Electing a President," Am. Math. Soc., http://www.ams.org/featurecolumn/archive elections.html, Apr. 2008.
[56] I.H. Witten and E. Frank, Data Mining: Practical Machine Learning Tools with Java Implementations. Morgan Kaufmann, 2000.
[57] T. Sobol-Shikler, R. el Kaliouby, and P. Robinson, "Design Challenges in Multi-Modal Inference Systems for Human- Computer Interaction," Proc. Second Cambridge Workshop Universal Access and Assistive Tehnology, 2004.
[58] M. Friedman, "A Comparison of Alternative Tests of Significance for the Problem of m Rankings," Annals of Math. Statistics, vol. 11, pp. 67-73, 1940.
[59] T. Sobol-Shikler, "Analysis of Affective Expressions in Speech," Technical Report UCAM-CL-TR-740, Computer Laboratory, Univ. of Cambridge, 2009.
[60] P. Boersma, "Accurate Short-Term Analysis of the Fundamental Frequency and the Harmonics to-Noise Ratio of a Sampled Sound," Proc. Inst. of Phonetic Sciences, Amsterdam, vol. 17, pp. 97-110, 1993.
[61] E. Zwicker, G. Flottorp, and S.S. Stevens, "Critical Bandwidth in Loudness Summation," J. Acoustical Soc. of Am., vol. 29, pp. 548-557, 1961.
[62] E. Zwicker, "Subdivision of the Audible Frequency Range into Critical Bands (Frequenzgruppen)," J. Acoustical Soc. of Am., vol. 33, pp. 248-249, 1961.
[63] Lamblichus, On the Pythagorean Life, E.G. Clark (translator). Liverpool Univ. Press, 1989.
[64] P. Gorman, Pythagoras, a Life. Routledge and K. Paul, 1979.
[65] Galileo, Dialogues Concerning Two New Sciences. Dover Publications, Inc., 1954.
[66] D.A. Schartz, Q.C. Howe, and D. Purves, "The Statistical Structure of Human Speech Sounds Predicts Musical Universals," J. Neuroscience, vol. 23, pp. 7160-7168, 2003.
[67] M.J. Tramo, P.A. Cariani, B. Delgutte, and L.D. Braida, "Neurobiology of Harmony Perception," The Cognitive Neuroscience of Music, I. Peretz and R. Zatorre, eds., Oxford Univ. Press, 2003.
[68] S. Visa and A. Ralescu, "Issues in Mining Imbalanced Data Sets —A Review Paper," Proc. Midwest Artificial Intelligence and Cognitive Science Conf., pp. 67-73, 2005.
[69] V. Vapnik, Estimation of Dependences Based on Empirical Data. Springer-Verlag, 1982.
[70] J. Platt, "Machines Using Sequential Minimal Optimization," Advances in Kernel Methods—Support Vector Learning, B. Schoelkopf, C. Burges, and A. Smola, eds., MIT Press, 1998.
[71] J.R. Quinlan, C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.
[72] M.A. Hall, Correlation-Based Feature Sub-Set Selection for Machine Learning. Hamilton, 1998.
[73] M. Hollander and D. Wolfe, Nonparametric Statistical Methods. J. Wiley, 1973.
[74] "Cambridge Dictionaries Online," Cambridge Univ. Press, http:/dictionary.cambridge.org/, 2008.
[75] M. Grundland and N.A. Dodgson, "Color Search and Replace," Proc. EUROGRAPHICS Workshop Computational Aesthetics in Graphics, Visualization and Imaging, pp. 101-109. 2005.
[76] M. Grundland, "Color, Style and Composition in Image Processing," PhD dissertation, Computer Laboratory, Univ. of Cambridge, 2007.

Index Terms:
Affective computing, human perception, cognition, affective states, emotions, speech, machine learning, intelligent systems, multiclass, multilabel.
Citation:
Tal Sobol-Shikler, Peter Robinson, "Classification of Complex Information: Inference of Co-Occurring Affective States from Their Expressions in Speech," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 32, no. 7, pp. 1284-1297, July 2010, doi:10.1109/TPAMI.2009.107
Usage of this product signifies your acceptance of the Terms of Use.