Most of the existing research on 3D facial expression recognition has been done using static 3D meshes. 3D videos of a face are believed to contain more information in terms of the facial dynamics which are very critical for expression recognition. This paper presents a fully automatic framework which exploits the dynamics of textured 3D videos for recognition of six discrete facial expressions. Local video-patches of variable lengths are extracted from numerous locations of the training videos and represented as points on the Grassmannian manifold. An efficient graph-based spectral clustering algorithm is used to separately cluster these points for every expression class. Using a valid Grassmannian kernel function, the resulting cluster centers are embedded into a Reproducing Kernel Hilbert Space (RKHS) where six binary SVM models are learnt. Given a query video, we extract video-patches from it, represent them as points on the manifold and match these points with the learnt SVM models followed by a voting based strategy to decide about the class of the query video. The proposed framework is also implemented in parallel on 2D videos and a score level fusion of 2D & 3D videos is performed for performance improvement of the system. The experimental results on the largest publicly available 3D video database, BU4DFE, show that the system achieves a very high classification accuracy and outperforms the current state of the art algorithms for facial expression recognition from 3D videos.
Munawar Hayat, Mohammed Bennamoun, "An Automatic Framework for Textured 3D Video-based Facial Expression Recognition", IEEE Transactions on Affective Computing, , no. 1, pp. 1, PrePrints PrePrints, doi:10.1109/TAFFC.2014.2330580