Los Angeles, CA
March 31, 2009 to April 2, 2009
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/CSIE.2009.113
This paper proposes a new approach for emotion recognition based on a hybrid of hidden Markov models (HMMs) and artificial neural network (ANN), using both utterance and segment level information from speech. To combine the advantage on capability to dynamic time warping of HMMs and pattern recognition of ANN, the utterance is viewed as a series of voiced segments, and feature vectors extracted from the segments are normalized into fixed coefficients using orthogonal polynomials methods, and then, distortions are calculated as an input of ANN. Meanwhile, the utterance as a whole is modeled by HMMs, and likelihood probabilities derived from the HMMs are normalized to be another input of ANN. Adopting Beihang University Database of Emotional Speech (BHUDES) and Berlin database of emotional speech, comparison between isolated HMMs and hybrid of HMMs/ANN proves that the approach introduced in this paper is more effective, and the average recognition rate of five emotion states has reached 81.7%.
speech emotion recognition, multi_level, HMM, ANN
Xia Mao, Lijiang Chen, Liqin Fu, "Multi-level Speech Emotion Recognition Based on HMM and ANN", CSIE, 2009, 2009 WRI World Congress on Computer Science and Information Engineering, CSIE, 2009 WRI World Congress on Computer Science and Information Engineering, CSIE 2009, pp. 225-229, doi:10.1109/CSIE.2009.113