The Community for Technology Leaders
Green Image
The present study investigated unimodal and multimodal emotion perception by humans, with an eye for applying the findings towards automated affect detection. The focus was on assessing the reliability by which untrained human observers could detect naturalistic expressions of non-basic affective states (boredom, engagement/flow, confusion, frustration, and neutral) from previously recorded videos of learners interacting with a computer tutor. The experiment manipulated three modalities to produce seven conditions: face, speech, context, face+speech, face+context, speech+context, face+speech+context. Agreement between two observers (OO) and between an observer and a learner (LO) were computed and analyzed with mixed-effects logistic regression models. The results indicated that agreement was generally low (kappas ranged from .030 to .183), but, with one exception, was greater than chance. Comparisons of overall agreement (across affective states) between the unimodal and multimodal conditions supported redundancy effects between modalities, but there were superadditive, additive, redundant, and inhibitory effects when affective states were individually considered. There was both convergence and divergence of patterns in the OO and LO datasets; however, LO models yielded lower agreement but higher multimodal effects compared to OO models. Implications of the findings for automated affect detection are discussed.
emotion perception, Multi-modal recognition, human recognition of emotion

S. K. D'Mello, N. Dowell and A. Graesser, "Unimodal and Multimodal Human Perception of Naturalistic Non-Basic Affective States during Human-Computer Interactions," in IEEE Transactions on Affective Computing.
85 ms
(Ver 3.3 (11022016))