Learning Local Responses of Facial Landmarks with Conditional Variational Auto-Encoder for Face Alignment
2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017) (2017)
Washington, DC, DC, USA
May 30, 2017 to June 3, 2017
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/FG.2017.117
This work proposes a novel convolutional neural network architecture which can locate landmarks accurately by learning local responses of facial landmarks. The network consists of a Conditional Variational Auto-Encoder(CVAE) and a Deep Convolutional Neural Network(DCNN). The CVAE is used to learn the response maps of facial landmarks from face images and the DCNN is used to learn accurate landmark locations from the response maps and facial textures. The CVAE consists of a face encoder, which extracts high-level information from raw pixels, and a decoder which outputs local response maps from high-level coding. We derive the CVAE used for catching local responses as an optimization problem, which can be solved through back-propagation. Extensive experiments show that the proposed CVAE can learn better local response maps than Fully Convolutional Network(FCN). Our method outperforms state-of-the-art methods on AFLW(5 points) and the challenging subset of 300-W(68 points), which means our method shows advantages in the condition of complex poses and expressions.
S. Liu, Y. Huang, J. Hu and W. Deng, "Learning Local Responses of Facial Landmarks with Conditional Variational Auto-Encoder for Face Alignment," 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017)(FG), Washington, DC, DC, USA, 2017, pp. 947-952.