Ninth IEEE Symposium on Computers and Communications 2004 Volume 2 (ISCC"04)
Recognizing emotions for the audio-visual document indexing
Alexandria, Egypt
June 28-July 01
ISBN: 0-7803-8623-X
G. Quenot, Laboratoire CLIPS-IMAG, Grenoble, France
In this paper, we proposed using MFCC coefficients (mel-scaled cepstral coefficients) and a simple but efficient classifying method: vector quantification (VQ) to perform speaker-dependent emotion recognition. Many other features: energy, pitch, zero crossing, phonetic rate, LPC... and their derivatives are also tested and combined with MFCC coefficients in order to find the best combination. Other models, GMM and HMM (discrete and continuous hidden Markov model), are studied as well in the hope that the use of continuous distribution and the temporal evolution of this set of features will improve the quality of emotion recognition. The accuracy recognizing five different emotions exceeds 80% by using only MFCC coefficients with VQ model. This is a simple but efficient approach, the result is even much better than those obtained with the same database in human evaluations by listening and judging without returning permission nor comparisons between sentences (Inger Samso Engberg and Anya Varnich Hansen, 2001).
Citation:
See-May Phoong, G. Quenot, E. Castelli, "Recognizing emotions for the audio-visual document indexing," iscc, vol. 2, pp.580-584, Ninth IEEE Symposium on Computers and Communications 2004 Volume 2 (ISCC"04), 2004