This Article 
 Bibliographic References 
 Add to: 
Interdependencies among Voice Source Parameters in Emotional Speech
July-September 2011 (vol. 2 no. 3)
pp. 162-174
Johan Sundberg, KTH Royal Institute of Technology, Stockholm
Sona Patel, University of Geneva, Geneva
Eva Björkner, KTH Royal Institute of Technology, Stockholm
Klaus R. Scherer, University of Geneva, Geneva
Emotions have strong effects on the voice production mechanisms and consequently on voice characteristics. The magnitude of these effects, measured using voice source parameters, and the interdependencies among parameters have not been examined. To better understand these relationships, voice characteristics were analyzed in 10 actors' productions of a sustained/a/vowel in five emotions. Twelve acoustic parameters were studied and grouped according to their physiological backgrounds, three related to subglottal pressure, five related to the transglottal airflow waveform derived from inverse filtering the audio signal, and four related to vocal fold vibration. Each emotion appeared to possess a specific combination of acoustic parameters reflecting a specific mixture of physiologic voice control parameters. Features related to subglottal pressure showed strong within-group and between-group correlations, demonstrating the importance of accounting for vocal loudness in voice analyses. Multiple discriminant analysis revealed that a parameter selection that was based, in a principled fashion, on production processes could yield rather satisfactory discrimination outcomes (87.1 percent based on 12 parameters and 78 percent based on three parameters). The results of this study suggest that systems to automatically detect emotions use a hypothesis-driven approach to selecting parameters that directly reflect the physiological parameters underlying voice and speech production.

[1] P.N. Juslin and K.R. Scherer, “Vocal Expression of Affect,” The New Handbook of Methods in Nonverbal Behavior Research, J.A. Harrigan, R. Rosenthal, and K.R. Scherer, eds., pp. 65-135, Oxford Univ. Press, 2005.
[2] B. Schuller, S. Reiter, R. Müller, M. Al-Hames, M. Lang, and G. Rigoll, “Speaker Independent Speech Emotion Recognition by Ensemble Classification,” Proc. IEEE Int'l Conf. Multimedia and Expo, pp. 864-867, 2005.
[3] R. Picard, E. Vyzas, and J. Healy, “Toward Machine Emotional Intelligence: Analysis of Affective Physiological State,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23, no. 10, pp. 1175-1191, Oct. 2001.
[4] C.M. Lee, S. Narayanan, and R. Pieraccini, “Recognition of Negative Emotions from the Speech Signal,” Proc. IEEE Workshop Automatic Speech Recognition and Understanding, 2001.
[5] A.I. Iliev, M.S. Scordilis, J.P. Papa, and A.X. Falco, “Spoken Emotion Recognition through Optimum-Path Forest Classification Using Glottal Features,” Computer, Speech, and Language, in press.
[6] J.F. Torres, E. MooreII, and E. Bryant, “A Study of Glottal Waveform Features for Deceptive Speech Classification,” Proc. IEEE Int'l Conf. Acoustics, Speech, and Signal Processing, pp. 4489-4492, 2008.
[7] M. Airas and P. Alku, “Emotions in Short Vowel Segments: Effects of the Glottal Flow as Reflected by the Normalized Amplitude Quotient,” Proc. Affective Dialogue Systems Workshop, pp. 13-24, 2004.
[8] E. Moore, M.A. Clements, J.W. Peifer, and L. Weisser, “Critical Analysis of the Impact of Glottal Features in the Classification of Clinical Depression in Speech,” IEEE Trans. Biomedical Eng., vol. 55, no.1, pp. 96-107, Jan. 2008.
[9] J. Toivanen et al., “Emotions in [a]: A Perceptual and Acoustic Study,” Logopedics, Phoniatrics, Vocology, vol. 31, pp. 43-48, 2006.
[10] C. Pereira, “Dimensions of Emotional Meaning in Speech,” Proc. ISCA Workshop Speech and Emotion, pp. 25-28, 2000.
[11] J. Fontaine, K.R. Scherer, E. Roesch, and P. Ellsworth, “The World of Emotions Is Not Two-Dimensional,” Psychological Science, vol. 18, pp. 1050-1057, 2007.
[12] R.S. Green and N. Cliff, “Multidimensional Comparisons of Structures of Vocally and Facially Expressed Emotion,” Perception and Psychophysics, vol. 17, no. 5, pp. 429-438, 1975.
[13] P. Ekman and W. Friesen, Facial Action Coding System: A Technique for the Measurement of Facial Movement. Consulting Psychologists Press, 1978.
[14] Y.I. Tian, T. Kanade, and J.F. Cohn, “Recognizing Action Units for Facial Expression Analysis,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol 23, no 2, pp. 97-115, Feb. 2001.
[15] E. Krumhuber and A. Kappas, “Moving Smiles: The Role of Dynamic Components for the Perception of the Genuineness of Smiles,” J. Nonverbal Behavior, vol. 29, pp. 3-24, 2005.
[16] K.R. Scherer, “Appraisal Considered as a Process of Multilevel Sequential Checking,” Appraisal Processes in Emotion: Theory, Methods, Research, K.R. Scherer, A. Schorr, and T. Johnstone, eds., pp. 92-120, Oxford Univ. Press, 2001.
[17] G. Fant, Speech Acoustics and Phonetics. Kluwer Academic Publishers, 2004.
[18] J. Gauffin and J. Sundberg, “Spectral Correlates of Glottal Voice Source Waveform Characteristics,” J. Speech and Hearing Research, vol. 32, pp. 556-565, 1989.
[19] J. Sundberg, M. Andersson, and C. Hultqvist, “Effects of Subglottal Pressure Variation on Professional Baritone Singers' Voice Sources,” J. Acoustical Soc. Am., vol. 105, pp. 1965-1971, 1999.
[20] J. Sundberg, E. Fahlstedt, and A. Morell, “Effects on the Glottal Voice Source of Vocal Loudness Variation in Untrained Female and Male Voices,” J. Acoustical Soc. Am., vol. 117, no. 2, pp. 879-885, 2005.
[21] J. Sundberg, M. Thalén, P. Alku, and E. Vilkman, “Estimating Perceived Phonatory Pressedness in Singing from Flow Glottograms,” J. Voice, vol. 18, pp. 56-62, 2004.
[22] H.M. Hanson, “Glottal Characteristics of Female Speakers: Acoustic Correlates,” J. Acoustical Soc. Am., vol. 101, no. 1, pp. 466-481, 1997.
[23] P. Ladefoged and N.P. McKinney, “Loudness, Sound Pressure, and Subglottal Pressure in Speech,” J. Acoustical Soc. Am., vol. 35, pp. 454-460, 1963.
[24] S. Patel, K.R. Scherer, J. Sundberg, and E. Bjorkner, “Mapping Emotions into Acoustic Space: The Role of Voice Production,” Biological Psychology, vol. 87, pp. 93-98, 2011.
[25] A. Batliner, K. Fisher, R. Huber, J. Spilker, and E. Noth, “Desperately Seeking Emotions or: Actors, Wizards, and Human Beings,” Proc. ISCA Workshop Speech and Emotion, pp. 195-200, 2000.
[26] T. Vogt and E. André, “Improving Automatic Emotion Recognition from Speech via Gender Differentiation,” Proc. IEEE Int'l Conf. Multimedia, 2005.
[27] S. Lee, S. Yildirim, A. Kazemzadeh, and S. Narayanan, “An Articulatory Study of Emotional Speech Production,” Proc. Interspeech, pp. 497-500, 2005.
[28] T. Bänziger and K.R. Scherer, “Introducing the Geneva Multimodal Emotion Portrayal (GEMEP) Corpus,” A Blueprint for an Affectively Competent Agent: Cross-Fertilization between Emotion Psychology, Affective Neuroscience, and Affective Computing, K.R. Scherer, T. Bänziger, and E. Roesch, eds., Oxford Univ. Press, 2010.
[29] J. Sundberg and M. Nordenberg, “Effects of Vocal Loudness Variation on Spectrum Balance as Reflected by the Alpha Measure of Long-Term-Average Spectra of Speech,” J. Acoustical Soc. Am., vol. 120, pp. 453-457, 2006.
[30] P. Boersma and D. Weenink, “Praat: Doing Phonetics by Computer [Computer Program],” Version 5.1.43, Retrieved 4 Aug. 2010 from http:/, 2010.
[31] F. Roers, D. Mürbe, and J. Sundberg, “Predicted Singers' Vocal Fold Lengths and Voice Measures Classification—A Study of X-Ray Morphological,” J. Voice, vol. 23, no. 4, pp. 408-413, 2009.

Index Terms:
Paralanguage analysis, affect sensing and analysis, affective computing, voice source, vocal physiology.
Johan Sundberg, Sona Patel, Eva Björkner, Klaus R. Scherer, "Interdependencies among Voice Source Parameters in Emotional Speech," IEEE Transactions on Affective Computing, vol. 2, no. 3, pp. 162-174, July-Sept. 2011, doi:10.1109/T-AFFC.2011.14
Usage of this product signifies your acceptance of the Terms of Use.