The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.02 - July-December (2010 vol.1)
pp: 119-131
Björn Schuller , Technische Universität München, München
Bogdan Vlasenko , Otto-von-Guericker Universität (OVGU), Magdeburg
Florian Eyben , Technische Universität München, München
Martin Wöllmer , Technische Universität München, München
André Stuhlsatz , University of Applied Sciences Düsseldorf, Düsseldorf
Andreas Wendemuth , Otto-von-Guericker Universität (OVGU), Magdeburg
Gerhard Rigoll , Technische Universität München, München
ABSTRACT
As the recognition of emotion from speech has matured to a degree where it becomes applicable in real-life settings, it is time for a realistic view on obtainable performances. Most studies tend to overestimation in this respect: Acted data is often used rather than spontaneous data, results are reported on preselected prototypical data, and true speaker disjunctive partitioning is still less common than simple cross-validation. Even speaker disjunctive evaluation can give only a little insight into the generalization ability of today's emotion recognition engines since training and test data used for system development usually tend to be similar as far as recording conditions, noise overlay, language, and types of emotions are concerned. A considerably more realistic impression can be gathered by interset evaluation: We therefore show results employing six standard databases in a cross-corpora evaluation experiment which could also be helpful for learning about chances to add resources for training and overcoming the typical sparseness in the field. To better cope with the observed high variances, different types of normalization are investigated. 1.8 k individual evaluations in total indicate the crucial performance inferiority of inter to intracorpus testing.
INDEX TERMS
Affective computing, speech emotion recognition, cross-corpus evaluation, normalization
CITATION
Björn Schuller, Bogdan Vlasenko, Florian Eyben, Martin Wöllmer, André Stuhlsatz, Andreas Wendemuth, Gerhard Rigoll, "Cross-Corpus Acoustic Emotion Recognition: Variances and Strategies", IEEE Transactions on Affective Computing, vol.1, no. 2, pp. 119-131, July-December 2010, doi:10.1109/T-AFFC.2010.8
REFERENCES
[1] E. Scripture , “A Study of Emotions by Speech Transcription,” Vox, vol. 31, pp. 179-183, 1921.
[2] E. Skinner , “A Calibrated Recording and Analysis of the Pitch, Force, and Quality of Vocal Tones Expressing Happiness and Sadness,” Speech Monographs, vol. 2, pp. 81-137, 1935.
[3] G. Fairbanks and W. Pronovost , “An Experimental Study of the Pitch Characteristics of the Voice during the Expression of Emotion,” Speech Monographs, vol. 6, pp. 87-104, 1939.
[4] C. Williams and K. Stevens , “Emotions and Speech: Some Acoustic Correlates,” J. Acoustical Soc. Am., vol. 52, pp. 1238-1250, 1972.
[5] K.R. Scherer , “Vocal Affect Expression: A Review and a Model for Future Research,” Psychological Bull., vol. 99, pp. 143-165, 1986.
[6] C. Whissell , “The Dictionary of Affect in Language,” Emotion: Theory, Research and Experience, vol. 4, The Measurement of Emotions, R.Plutchik and H. Kellerman, eds., pp. 113-131, Academic Press, 1989.
[7] R. Picard , Affective Computing. MIT Press, 1997.
[8] R. Cowie , E. Douglas-Cowie , N. Tsapatsoulis , G. Votsis , S. Kollias , W. Fellenz , and J. Taylor , “Emotion Recognition in Human-Computer Interaction,” IEEE Signal Processing Magazine, vol. 18, no. 1, pp. 32-80, 2001.
[9] E. Shriberg , “Spontaneous Speech: How People Really Talk and Why Engineers Should Care,” Proc. EUROSPEECH, pp. 1781-1784, 2005.
[10] C.M. Lee and S.S. Narayanan , “Toward Detecting Emotions in Spoken Dialogs,” IEEE Trans. Speech and Audio Processing, vol. 13, no. 2, pp. 293-303, 2005.
[11] M. Schröder , L. Devillers , K. Karpouzis , J.-C. Martin , C. Pelachaud , C. Peter , H. Pirker , B. Schuller , J. Tao , and I. Wilson , “What Should a Generic Emotion Markup Language Be Able to Represent?” Proc. Second Int'l Conf. Affective Computing and Intelligent Interaction, pp. 440-451, 2007.
[12] A. Wendemuth , J. Braun , B. Michaelis , F. Ohl , D. Rösner , H. Scheich , and R. Warnem , “Neurobiologically Inspired, Multimodal Intention Recognition for Technical Communication Systems (NIMITEK),” Proc. Fourth IEEE Tutorial and Research Workshop on Perception and Interactive Technologies for Speech-based Systems, pp. 141-144, 2008.
[13] M. Schrö , R. Cowie , D. Heylen , M. Pantic , C. Pelachaud , and B. Schuller , “Towards Responsive Sensitive Artificial Listeners,” Proc. Fourth Int'l Workshop on Human-Computer Conversation, 2008.
[14] Z. Zeng , M. Pantic , G.I. Rosiman , and T.S. Huang , “A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 31, no. 1, pp. 39-58, Jan. 2009.
[15] B. Schuller , R. Müller , B. Hörnler , A. Höthker , H. Konosu , and G. Rigoll , “Audiovisual Recognition of Spontaneous Interest within Conversations,” Proc. Int'l Conf. Multimodal Interfaces, pp. 30-37, 2007.
[16] S. Steidl , Automatic Classification of Emotion-Related User States in Spontaneous Children's Speech. Logos Verlag, 2009.
[17] E. Douglas-Cowie , R. Cowie , I. Sneddon , C. Cox , O. Lowry , M. McRorie , J.-C. Martin , L. Devillers , S. Abrilan , A. Batliner , N. Amir , and K. Karpousis , “The HUMAINE Database: Addressing the Collection and Annotation of Naturalistic and Induced Emotional Data,” Proc. Int'l Conf. Affective Computing and Intelligent Interaction, A. Paiva, R. Prada, and R.W. Picard, eds., pp. 488-500, 2007.
[18] M. Wöllmer , F. Eyben , S. Reiter , B. Schuller , C. Cox , E. Douglas-Cowie , and R. Cowie , “Abandoning Emotion Classes— Towards Continuous Emotion Recognition with Modelling of Long-Range Dependencies,” Proc. INTERSPEECH, pp. 597-600, 2008.
[19] S. Steininger , F. Schiel , O. Dioubina , and S. Raubold , “Development of User-State Conventions for the Multimodal Corpus in Smartkom,” Proc. Workshop Multimodal Resources and Multimodal Systems Evaluation, pp. 33-37, 2002.
[20] M. Grimm , K. Kroschel , and S. Narayanan , “The Vera am Mittag German Audio-Visual Emotional Speech Database,” Proc. Int'l Conf. Multimedia & Expo, pp. 865-868, 2008.
[21] L. Devillers , L. Vidrascu , and L. Lamel , “Challenges in Real-Life Emotion Annotation and Machine Learning Based Detection,” Neural Networks, vol. 18, no. 4, pp. 407-422, 2005.
[22] L. Devillers and L. Vidrascu , “Real-Life Emotion Recognition in Speech,” Speaker Classification II, pp. 34-42, Sept. 2007.
[23] A. Batliner , D. Seppi , B. Schuller , S. Steidl , T. Vogt , J. Wagner , L. Devillers , L. Vidrascu , N. Amir , and V. Aharonson , “Patterns, Prototypes, Performance,” Proc. HSS-Cooperation Seminar Pattern Recognition in Medical and Health Eng., J. Hornegger, K. Höller, P.Ritt, A. Borsdorf, and H.P. Niedermeier, eds., pp. 85-86, 2008.
[24] B. Schuller , R. Müller , F. Eyben , J. Gast , B. Hörnler , M. Wöllmer , G. Rigoll , A. Höthker , and H. Konosu , “Being Bored? Recognising Natural Interest by Extensive Audiovisual Integration for Real-Life Application,” Image and Vision Computing J., vol. 27, no. 12, pp. 1760-1774, 2009.
[25] S. Tsakalidis and W. Byrne , “Acoustic Training from Heterogeneous Data Sources: Experiments in Mandarin Conversational Telephone Speech Transcription,” Proc. IEEE Int'l Conf. Aacoustics, Speech, and Signal Processing, 2005.
[26] S. Tsakalidis , “Linear Transforms in Automatic Speech Recognition: Estimation Procedures and Integration of Diverse Acoustic Data,” PhD dissertation, 2005.
[27] D. Gildea , “Corpus Variation and Parser Performance,” Proc. Conf. Empirical Methods in Natural Language Processing, pp. 167-202, 2001.
[28] Y. Yang , T. Ault , and T. Pierce , “Combining Multiple Learning Strategies for Effective Cross Validation,” Proc. 17th Int'l Conf. Machine Learning, pp. 1167-1174, 2000.
[29] R. Barzilay and L. Lee , “Learning to Paraphrase: An Unsupervised Approach Using Multiple-Sequence Alignment,” Proc. Human Language Technology Conf.-North Am. Chapter Assoc. Computational Linguistics Conf., pp. 16-23, 2003.
[30] K. Soenmez , M. Plauche , E. Shriberg , and H. Franco , “Consonant Discrimination in Elicited and Spontaneous Speech: A Case for Signal-Adaptive Front Ends in ASR,” Proc. Int'l Conf. Spoken Language Processing, pp. 548-551, 2000.
[31] M. Shami and W. Verhelst , “Automatic Classification of Emotions in Speech Using Multi-Corpora Approaches,” Proc. Second Ann. IEEE BENELUX/DSP Valley Signal Processing Symp., pp. 3-6, 2006.
[32] M. Shami and W. Verhelst , “Automatic Classification of Expressiveness in Speech: A Multi-Corpus Study,” Speaker Classification II, C. Müller, ed., pp. 43-56, 2007.
[33] E. Douglas-Cowie , N. Campbell , R. Cowie , and P. Roach , “Emotional Speech: Towards a New Generation of Databases,” Speech Comm., vol. 40, nos. 1-2, pp. 33-60, 2003.
[34] D. Ververidis and C. Kotropoulos , “A Review of Emotional Speech Databases,” Proc. Panhellenic Conf. Informatics, pp. 560-574, 2003.
[35] I.S. Engbert and A.V. Hansen , “Documentation of the Danish Emotional Speech Database DES,” technical report, Center for PersonKommunikation, Aalborg Univ., Denmark, 2007.
[36] F. Burkhardt , A. Paeschke , M. Rolfes , W. Sendlmeier , and B. Weiss , “A Database of German Emotional Speech,” Proc. INTERSPEECH, pp. 1517-1520, 2005.
[37] J. Hansen and S. Bou-Ghazale , “Getting Started with SUSAS: A Speech under Simulated and Actual Stress Database,” Proc. EUROSPEECH, vol. 4, pp. 1743-1746, 1997.
[38] O. Martin , I. Kotsia , B. Macq , and I. Pitas , “The eNTERFACE'05 Audio-Visual Emotion Database,” Proc. IEEE Workshop Multimedia Database Management, 2006.
[39] D. Ververidis and C. Kotropoulos , “Fast Sequential Floating Forward Selection Applied to Emotional Speech Features Estimated on DES and SUSAS Data Collection,” Proc. European Signal Processing Conf., 2006.
[40] D. Datcu and L.J. Rothkrantz , “The Recognition of Emotions from Speech Using Gentleboost Classifier. A Comparison Approach,” Proc. Int'l Conf. Computer Systems and Technologies, vol. 1, pp. 1-6, 2006.
[41] B. Schuller , D. Seppi , A. Batliner , A. Meier , and S. Steidl , “Towards More Reality in the Recognition of Emotional Speech,” Proc. IEEE Int'l Conf. Acoustics, Speech, and Signal Processing, pp.941-944, 2007.
[42] H. Meng , J. Pittermann , A. Pittermann , and W. Minker , “Combined Speech-Emotion Recognition for Spoken Human-Computer Interfaces,” Proc. IEEE Int'l Conf. Signal Processing and Comm., 2007.
[43] V. Slavova , W. Verhelst , and H. Sahli , “A Cognitive Science Reasoning in Recognition of Emotions in Audio-Visual Speech,” Int'l J. Information Technologies and Knowledge, vol. 2, pp. 324-334, 2008.
[44] B. Schuller , M. Wimmer , L. Mösenlechner , C. Kern , and G. Rigoll , “Brute-Forcing Hierarchical Functionals for Paralinguistics: A Waste of Feature Space?” Proc. IEEE Int'l Conf. Acoustics, Speech, and Sigal Processing, pp. 4501-4504, 2008.
[45] D. Datcu and L.J. M. Rothkrantz , “Semantic Audio-Visual Data Fusion for Automatic Emotion Recognition,” Proc. Euromedia, 2008.
[46] M. Mansoorizadeh and N.M. Charkari , “Bimodal Person-Dependent Emotion Recognition Comparison of Feature Level and Decision Level Information Fusion,” Proc. First Int'l Conf. Pervasive Technologies Related to Assistive Environments, pp. 1-4, 2008.
[47] M. Paleari , R. Benmokhtar , and B. Huet , “Evidence Theory-Based Multimodal Emotion Recognition,” Proc. 15th Int'l Multimedia Modeling Conf. on Advances in Multimedia Modeling, pp. 435-446, 2008.
[48] D. Cairns and J.H. L. Hansen , “Nonlinear Analysis and Detection of Speech under Stressed Conditions,” J. Acoustical Soc. Am., vol. 96, no. 6, pp. 3392-3400, Dec. 1994.
[49] L. Bosch , “Emotions: What Is Possible in the ASR Framework?” Proc. ISCA Workshop Speech and Emotion, pp. 189-194, 2000.
[50] G. Zhou , J.H.L. Hansen , and J.F. Kaiser , “Nonlinear Feature Based Classification of Speech under Stress,” IEEE Trans. Speech and Audio Processing, vol. 9, no. 3, pp. 201-216, Mar. 2001.
[51] R.S. Bolia and R.E. Slyh , “Perception of Stress and Speaking Style for Selected Elements of the SUSAS Database,” Speech Comm., vol. 40, no. 4, pp. 493-501, 2003.
[52] B. Schuller , M. Wimmer , D. Arsic , T. Moosmayr , and G. Rigoll , “Detection of Security Related Affect and Behaviour in Passenger Transport,” Proc. INTERSPEECH, pp. 265-268, 2008.
[53] L. He , M. Lech , N. Maddage , and N. Allen , “Stress and Emotion Recognition Based on Log-Gabor Filter Analysis of Speech Spectrograms,” Proc. Int'l Conf. Affective Computing and Intelligent Interaction, 2009.
[54] B. Schuller , N. Köhler , R. Müller , and G. Rigoll , “Recognition of Interest in Human Conversational Speech,” Proc. INTERSPEECH, pp. 793-796, 2006.
[55] B. Vlasenko , B. Schuller , K. Tadesse Mengistu , and G. Rigoll , “Balancing Spoken Content Adaptation and Unit Length in the Recognition of Emotion and Interest,” Proc. INTERSPEECH, pp.805-808, 2008.
[56] W. Wahlster , “Smartkom: Symmetric Multimodality in an Adaptive and Reusable Dialogue Shell,” Proc. Human Computer Interaction Status Conf., pp. 47-62, 2003.
[57] D. Oppermann , F. Schiel , S. Steininger , and N. Beringer , “Off-Talk —A Problem for Human-Machine-Interaction?” Proc. EUROSPEECH, pp. 2197-2200, 2001.
[58] A. Schweitzer , N. Braunschweiler , T. Klankert , B. Säiber;lich , and B. Möbius , “Restricted Unlimited Domain Synthesis,” Proc. EUROSPEECH, pp. 1321-1324, 2003.
[59] T. Vogt and E. André , “Improving Automatic Emotion Recognition from Speech via Gender Differentiation,” Proc. Int'l Conf. Language Resources and Evaluation, 2006.
[60] R. Banse and K.R. Scherer , “Acoustic Profiles in Vocal Emotion Expression,” J. Personality and Social Psychology, vol. 70, no. 3, pp.614-636, 1996.
[61] Y. Li and Y. Zhao , “Recognizing Emotions in Speech Using Short-Term and Long-Term Features,” Proc. Int'l Conf. Spoken Language Processing, p. 379, 1998.
[62] G. Zhou , J.H.L. Hansen , and J.F. Kaiser , “Linear and Nonlinear Speech Feature Analysis for Stress Classification,” Proc. Int'l Conf. Spoken Language Processing, 1998.
[63] T.L. Nwe , S.W. Foo , and L.C. De Silva , “Classification of Stress in Speech Using Linear and Nonlinear Features,” Proc. IEEE Int'l Conf. Acoustics, Speech, and Signal Processing, vol. 2, pp. II-9-12, 2003.
[64] B. Schuller , G. Rigoll , and M. Lang , “Hidden Markov Model-Based Speech Emotion Recognition,” Proc. IEEE Int'l Conf. Acoustics, Speech, and Signal Processing, pp. 1-4, 2003.
[65] C.M. Lee , S. Yildirim , M. Bulut , A. Kazemzadeh , C. Busso , Z. Deng , S. Lee , and S. Narayanan , “Emotion Recognition Based on Phoneme Classes,” Proc. Int'l Conf. Spoken Language Processing, 2004.
[66] B. Vlasenko and A. Wendemuth , “Tuning Hidden Markov Model for Speech Emotion Recognition,” Proc. DAGA, Mar. 2007.
[67] D. Ververidis and C. Kotropoulos , “Automatic Speech Classification to Five Emotional States Based on Gender Information,” Proc. EUSIPCO, pp. 341-344, 2004.
[68] B. Schuller , S. Steidl , and A. Batliner , “The INTERSPEECH 2009 Emotion Challenge,” Proc. INTERSPEECH, 2009.
[69] R. Barra , J.M. Montero , J. Macias-Guarasa , L.F. D'Haro , R. San-Segundo , and R. Cordoba , “Prosodic and Segmental Rubrics in Emotion Identification,” Proc. Int'l Conf. Acoustics, Speech, and Signal Processing, vol. 1, 2006.
[70] B. Schuller , A. Batliner , D. Seppi , S. Steidl , T. Vogt , J. Wagner , L. Devillers , L. Vidrascu , N. Amir , L. Kessous , and V. Aharonson , “The Relevance of Feature Type for the Automatic Classification of Emotional User States: Low Level Descriptors and Functionals,” Proc. INTERSPEECH, pp. 2253-2256, 2007.
[71] M. Lugger and B. Yang , “An Incremental Analysis of Different Feature Groups in Speaker Independent Emotion Recognition,” Proc. Int'l Congress Phonetic Sciences, pp. 2149-2152, Aug. 2007.
[72] B. Schuller , M. Wöllmer , F. Eyben , and G. Rigoll , The Role of Prosody in Affective Speech, pp. 285-307. Peter Lan Publishing Group, 2009.
[73] B. Schuller , M. Wimmer , L. Mösenlechner , C. Kern , D. Arsic , and G. Rigoll , “Brute-Forcing Hierarchical Functionals for Paralinguistics: A Waste of Feature Space?” Proc. IEEE Int'l Conf. Acoustics, Speech, and Signal Processing, Apr. 2008.
[74] L. Devillers , L. Lamel , and I. Vasilescu , “Emotion Detection in Task-Oriented Spoken Dialogs,” Proc. Int'l Conf. Multimedia & Expo, July 2003.
[75] B. Schuller , G. Rigoll , and M. Lang , “Speech Emotion Recognition Combining Acoustic Features and Linguistic Information in a Hybrid Support Vector Machine-Belief Network Architecture,” Proc. IEEE Int'l Conf. Acoustics, Speech, and Signal Processing, vol. 1, 2004.
[76] B. Schuller , R. Jiménez Villar , G. Rigoll , and M. Lang , “Meta-Classifiers in Acoustic and Linguistic Feature Fusion-Based Affect Recognition,” Proc. IEEE Int'l Conf. Acoustics, Speech, and Signal Processing, pp. 325-328, 2005.
[77] T. Athanaselis , S. Bakamidis , I. Dologlou , R. Cowie , E. Douglas-Cowie , and C. Cox , “ASR for Emotional Speech: Clarifying the Issues and Enhancing Performance,” Neural Networks, no. 18, pp.437-444, 2005.
[78] A. Batliner , B. Schuller , S. Schaeffler , and S. Steidl , “Mothers, Adults, Children, Pets—Towards the Acoustics of Intimacy,” Proc. IEEE Int'l Conf. Acoustics, Speech, and Signal Processing, pp. 4497-4500, 2008.
[79] B. Schuller , “Speaker, Noise, and Acoustic Space Adaptation for Emotion Recognition in the Automotive Environment,” Tagungsband 8.—ITG-Fachtagung Sprachkommunikation 2008, vol. ITG 211, VDE, 2008.
[80] B. Schuller , G. Rigoll , S. Can , and H. Feussner , “Emotion Sensitive Speech Control for Human-Robot Interaction in Minimal Invasive Surgery,” Proc. 17th Int'l Symp. Robot and Human Interactive Comm., pp. 453-458, 2008.
[81] B. Schuller , B. Vlasenko , D. Arsic , G. Rigoll , and A. Wendemuth , “Combining Speech Recognition and Acoustic Word Emotion Models for Robust Text-Independent Emotion Eecognition,” Proc. Int'l Conf. Multimedia & Expo, 2008.
[82] B. Vlasenko , B. Schuller , A. Wendemuth , and G. Rigoll , “On the Influence of Phonetic Content Variation for Acoustic Emotion Recognition,” Proc. Fourth IEEE Tutorial and Research Workshop on Perception and Interactive Technologies for Speech-Based Systems, 2008.
[83] B. Schuller , B. Vlasenko , R. Minguez , G. Rigoll , and A. Wendemuth , “Comparing One and Two-Stage Acoustic Modeling in the Recognition of Emotion in Speech,” Proc. IEEE Workshop Automatic Speech Recognition and Understanding, pp. 596-600, 2007.
[84] B. Vlasenko , B. Schuller , A. Wendemuth , and G. Rigoll , “Combining Frame and Turn-Level Information for Robust Recognition of Emotions within Wpeech,” Proc. INTERSPEECH, pp. 2249-2252, 2007.
[85] B. Vlasenko , B. Schuller , A. Wendemuth , and G. Rigoll , “Frame vs. Turn-Level: Emotion recognition from Speech Considering Static and Dynamic Processing,” Proc. Int'l Conf. Affective Computing and Intelligent Interaction, A. Paiva, ed., pp. 139-147, 2007.
[86] A. Batliner , S. Steidl , B. Schuller , D. Seppi , K. Laskowski , T. Vogt , L. Devillers , L. Vidrascu , N. Amir , L. Kessous , and V. Aharonson , “Combining Efforts for Improving Automatic Classification of Emotional User States,” Proc. Fifth Slovenian and First Int'l Language Technologies Conf., pp. 240-245, 2006.
[87] D. Ververidis and C. Kotropoulos , “Emotional Speech Recognition: Resources, Features, and Methods,” Speech Comm., vol. 48, no. 9, pp. 1162-1181, Sept. 2006.
[88] R. Fernandez and R.W. Picard , “Modeling Drivers' Speech under Stress,” Speech Comm., vol. 40, nos. 1-2, pp. 145-159, 2003.
[89] C. Lee , C. Busso , S. Lee , and S. Narayanan , “Modeling Mutual Influence of Interlocutor Emotion States in Dyadic Spoken Interactions,” Proc. INTERSPEECH, pp. 1983-1986, 2009.
[90] I. Cohen , N. Sebe , F.G. Gozman , M.C. Cirelo , and T.S. Huang , “Learning Bayesian Network Classifiers for Facial Expression Recognition Both Labeled and Unlabeled Data,” Proc. IEEE CS Conf Computer Vision and Pattern Recognition, vol. 1, pp. 595-601, June 2003.
[91] M. Slaney and G. McRoberts , “Baby Ears: A Recognition System for Affective Vocalizations,” Proc. IEEE Int'l Conf. Acoustics, Speech, and Signal Processing, vol. 2, pp. 985-988, 1998.
[92] C. Lee , E. Mower , C. Busso , S. Lee , and S. Narayanan , “Emotion Recognition Using a Hierarchical Binary Decision Tree Approach,” Proc. INTERSPEECH, pp. 320-323, 2009.
[93] T. Iliou and C.-N. Anagnostopoulos , “Comparison of Different Classifiers for Emotion Recognition,” Proc. Panhellenic Conf. Informatics, pp. 102-106, 2009.
[94] F. Dellaert , T. Polzin , and A. Waibel , “Recognizing Emotions in Speech,” Proc. Int'l Conf. Spoken Language Processing, vol. 3, pp.1970-1973, 1996.
[95] A. Batliner , S. Steidl , B. Schuller , D. Seppi , K. Laskowski , T. Vogt , L. Devillers , L. Vidrascu , N. Amir , L. Kessous , and V. Aharonson , “Combining Efforts for Improving Automatic Classification of Emotional User States,” Proc. Fifth Slovenian and First Int'l Language Technologies Conf., pp. 240-245, 2006.
[96] F. Eyben , M. Wöllmer , and B. Schuller , “openEAR—Introducing the Munich Open-Source Emotion and Affect Recognition Toolkit,” Proc. Int'l Conf. Affective Computing and Intelligent Interaction, pp. 576-581, 2009.
[97] B. Schuller , M. Lang , and G. Rigoll , “Robust Acoustic Speech Emotion Recognition by Ensembles of Classifiers,” Proc. DAGA, vol. I, pp. 329-330, 2005.
[98] D. Morrison , R. Wang , and L.C.D. Silva , “Ensemble Methods for Spoken Emotion Recognition in Call-Centres,” Speech Comm., vol. 49, no. 2, pp. 98-112, 2007.
[99] M. Kockmann , L. Burget , and J. Cernocky , “Brno University of Technology System for Interspeech 2009 Emotion Challenge,” Proc. INTERSPEECH, 2009.
[100] B. Schuller , D. Arsic , F. Wallhoff , and G. Rigoll , “Emotion Recognition in the Noise Applying Large Acoustic Feature Sets,” Proc. Speech Prosody, 2006.
[101] F. Eyben , B. Schuller , and G. Rigoll , “Wearable Assistance for the Ballroom-Dance Hobbyist—Holistic Rhythm Analysis and Dance-Style Classification,” Proc. Int'l Conf. Multimedia & Expo, 2007.
[102] I.H. Witten and E. Frank , Data Mining: Practical Machine Learning Tools and Techniques, second ed. Morgan Kaufmann, 2005.
[103] M. Grimm , K. Kroschel , and S. Narayanan , “Support Vector Regression for Automatic Recognition of Spontaneous Emotions in Speech,” Proc. IEEE Int'l Conf. Acoustics, Speech, and Signal Processing, vol. 4, 2007.
[104] S. Steidl , M. Levit , A. Batliner , E. Nöth , and H. Niemann , “`Of All Things the Measure Is Man': Automatic Classification of Emotions and Inter-Labeler Consistency,” Proc. IEEE Int'l Conf. Acoustics, Speech, and Signal Processing, pp. 317-320, 2005.
[105] L. Gillick and S.J. Cox , “Some Statistical Issues in the Comparison of Speech Recognition Algorithms,” Proc. IEEE Int'l Conf. Acoustics, Speech, and Signal Processing, vol. I, pp. 23-26, 1989.
[106] M. Brendel , R. Zaccarelli , B. Schuller , and L. Devillers , “Towards Measuring Similarity between Emotional Corpora,” Proc. Third Int'l Workshop EMOTION (satellite of LREC): Corpora for Research on Emotion and Affect, pp. 58-64, 2010.
25 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool