This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
2011 IEEE International Conference on Multimedia and Expo
Reliable accent specific unit generation with dynamic Gaussian mixture selection for multi-accent speech recognition
Barcelona
July 11-July 15
ISBN: 978-1-61284-348-3
Chao Zhang, Center for Speech and Language Technologies, Division of Technology Innovation and Development, Tsinghua National Laboratory for Information Science and Technology, Beijing, China
Yi Liu, Center for Speech and Language Technologies, Division of Technology Innovation and Development, Tsinghua National Laboratory for Information Science and Technology, Beijing, China
Yunqing Xia, Center for Speech and Language Technologies, Division of Technology Innovation and Development, Tsinghua National Laboratory for Information Science and Technology, Beijing, China
Thomas Fang Zheng, Center for Speech and Language Technologies, Division of Technology Innovation and Development, Tsinghua National Laboratory for Information Science and Technology, Beijing, China
Jesper Olsen, Nokia Research Center, Beijing, China
JiLei Tian, Nokia Research Center, Beijing, China
Multiple accents are often present in Mandarin speech, as most Chinese have learned Mandarin as a second language. We propose generating reliable accent specific unit together with dynamic Gaussian mixture selection for multi-accent speech recognition. Time alignment phoneme recognition is used to generate such unit and to model accent variations explicitly and accurately. Dynamic Gaussian mixture selection scheme builds a dynamical observation density for each specified frame in decoding, and leads to use Gaussian mixture component efficiently. This method increases the covering ability for a diversity of accent variations in multi-accent, and alleviates the performance degradation caused by pruned beam search without augmenting the model size. The effectiveness of this approach is evaluated on three typical Chinese accents Chuan, Yue and Wu. Our approach outperforms traditional acoustic model reconstruction approach significantly by 6.30%, 4.93% and 5.53%, respectively on Syllable Error Rate (SER) reduction, without degrading on standard speech.
Citation:
Chao Zhang, Yi Liu, Yunqing Xia, Thomas Fang Zheng, Jesper Olsen, JiLei Tian, "Reliable accent specific unit generation with dynamic Gaussian mixture selection for multi-accent speech recognition," icme, pp.1-6, 2011 IEEE International Conference on Multimedia and Expo, 2011
Usage of this product signifies your acceptance of the Terms of Use.