The Community for Technology Leaders
2017 IEEE Virtual Reality (VR) (2017)
Los Angeles, CA, USA
March 18, 2017 to March 22, 2017
ISSN: 2375-5334
ISBN: 978-1-5090-6648-3
pp: 112-121
Ran Luo , Department of Electrical and Computer Engineering, The University of New Mexico, USA
Qiang Fang , Institute of Linguistics, Chinese Academy of Social Sciences, China
Jianguo Wei , School of Software, Tianjin University, China
Wenhuan Lu , School of Software, Tianjin University, China
Weiwei Xu , State Key Laboratory of CAD & CG, Zhejiang University, China
Yin Yang , Department of Electrical and Computer Engineering, The University of New Mexico, USA
ABSTRACT
We propose an acoustic-VR system that converts acoustic signals of human language (Chinese) to realistic 3D tongue animation sequences in real time. It is known that directly capturing the 3D geometry of the tongue at a frame rate that matches the tongue's swift movement during the language production is challenging. This difficulty is handled by utilizing the electromagnetic articulography (EMA) sensor as the intermediate medium linking the acoustic data to the simulated virtual reality. We leverage Deep Neural Networks to train a model that maps the input acoustic signals to the positional information of pre-defined EMA sensors based on 1,108 utterances. Afterwards, we develop a novel reduced physics-based dynamics model for simulating the tongue's motion. Unlike the existing methods, our deformable model is nonlinear, volume-preserving, and accommodates collision between the tongue and the oral cavity (mostly with the jaw). The tongue's deformation could be highly localized which imposes extra difficulties for existing spectral model reduction methods. Alternatively, we adopt a spatial reduction method that allows an expressive subspace representation of the tongue's deformation. We systematically evaluate the simulated tongue shapes with real-world shapes acquired by MRI/CT. Our experiment demonstrates that the proposed system is able to deliver a realistic visual tongue animation corresponding to a user's speech signal.
INDEX TERMS
Tongue, Speech, Magnetic resonance imaging, Hidden Markov models, Solid modeling, Real-time systems, Three-dimensional displays
CITATION

R. Luo, Q. Fang, J. Wei, W. Lu, W. Xu and Y. Yang, "Acoustic VR in the mouth: A real-time speech-driven visual tongue system," 2017 IEEE Virtual Reality (VR), Los Angeles, CA, USA, 2017, pp. 112-121.
doi:10.1109/VR.2017.7892238
89 ms
(Ver 3.3 (11022016))