The Community for Technology Leaders
Green Image
Issue No. 04 - Oct.-Dec. (2014 vol. 5)
ISSN: 1949-3045
pp: 377-390
Houwei Cao , Department of Radiology, University of Pennsylvania, Philadelphia, PA
David G. Cooper , Department of Mathematics and Computer Science, Ursinus College, Collegeville, PA
Michael K. Keutmann , Department of Psychology, University of Illinois at Chicago, Chicago, IL
Ruben C. Gur , Neuropsychiatry Section, Department of Psychiatry, University of Pennsylvania
Ani Nenkova , Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA
Ragini Verma , Department of Radiology, University of Pennsylvania, Philadelphia, PA
People convey their emotional state in their face and voice. We present an audio-visual dataset uniquely suited for the study of multi-modal emotion expression and perception. The dataset consists of facial and vocal emotional expressions in sentences spoken in a range of basic emotional states (happy, sad, anger, fear, disgust, and neutral). 7,442 clips of 91 actors with diverse ethnicbackgrounds were rated by multiple raters in three modalities: audio, visual, and audio-visual. Categorical emotion labels andreal-value intensity values for the perceived emotion were collected using crowd-sourcing from 2,443 raters. The human recognition of intended emotion for the audio-only, visual-only, and audio-visual data are 40.9, 58.2 and 63.6 percent respectively. Recognition rates are highest for neutral, followed by happy, anger, disgust, fear, and sad. Average intensity levels of emotion are rated highest forvisual-only perception. The accurate recognition of disgust and fear requires simultaneous audio-visual cues, while anger andhappiness can be well recognized based on evidence from a single modality. The large dataset we introduce can be used to probe other questions concerning the audio-visual perception of emotion.
Visualization, Emotion recognition, Educational institutions, Electronic mail, Speech, Databases, Calibration

H. Cao, D. G. Cooper, M. K. Keutmann, R. C. Gur, A. Nenkova and R. Verma, "CREMA-D: Crowd-Sourced Emotional Multimodal Actors Dataset," in IEEE Transactions on Affective Computing, vol. 5, no. 4, pp. 377-390, 2014.
411 ms
(Ver 3.3 (11022016))