Most of the existing research on 3D facial expression recognition has been done using static 3D meshes. 3D videos of a face are believed to contain more information in terms of the facial dynamics which are very critical for expression recognition. This paper presents a fully automatic framework which exploits the dynamics of textured 3D videos for recognition of six discrete facial expressions. Local video-patches of variable lengths are extracted from numerous locations of the training videos and represented as points on the Grassmannian manifold. An efficient graph-based spectral clustering algorithm is used to separately cluster these points for every expression class. Using a valid Grassmannian kernel function, the resulting cluster centers are embedded into a Reproducing Kernel Hilbert Space (RKHS) where six binary SVM models are learnt. Given a query video, we extract video-patches from it, represent them as points on the manifold and match these points with the learnt SVM models followed by a voting based strategy to decide about the class of the query video. The proposed framework is also implemented in parallel on 2D videos and a score level fusion of 2D & 3D videos is performed for performance improvement of the system. The experimental results on the largest publicly available 3D video database, BU4DFE, show that the system achieves a very high classification accuracy and outperforms the current state of the art algorithms for facial expression recognition from 3D videos.
Mohammed Bennamoun, "An Automatic Framework for Textured 3D Video-based Facial Expression Recognition", IEEE Transactions on Affective Computing, , no. 1, pp. 1, PrePrints PrePrints, doi:10.1109/TAFFC.2014.2330580