This Article 
 Bibliographic References 
 Add to: 
Computer-Assisted Audiovisual Language Learning
June 2012 (vol. 45 no. 6)
pp. 38-47
Lijuan Wang, Microsoft Research Asia, Beijing
Yao Qian, Microsoft Research Asia, Beijing
Matthew Scott, Microsoft Research Asia, Beijing
Gang Chen, Microsoft Research Asia, Beijing
Frank Soong, Microsoft Research Asia, Beijing
Advances in speech-processing technology have enabled novel ways to learn a foreign language online. With Engkoo, researchers in China are working to turn any computer into a language learning assistant and make searching a language easier. The featured Web extra at is a video discussion titled “Computer-Assisted Audiovisual Language Learning” that demonstrates Engkoo, a Web-based computer-assisted audiovisual language-learning service that combines two emerging speech processing technologies—talking head and phonetic similarity search.

1. B. Seidlhofer, "Common Ground and Different Realities: World Englishes and English as a Lingua Franca," World Englishes, June 2009, pp. 236-245.
2. L.J. Zhang, R. Rubdy, and L. Alsagoff, "Englishes and Literatures-in-English in a Globalised World," Proc. 13th Int'l Conf. English in Southeast Asia (ESEA 08), National Inst. of Education, Singapore, 2008, pp. 42-58.
3. M.R. Scott, X. Liu, and M. Zhou, "Towards a Specialized Search Engine for Language Learners," Proc. IEEE, Sept. 2011, pp. 1462-1465.
4. D.W. Massaro, Perceiving Talking Faces: From Speech Perception to a Behavioral Principle, MIT Press, 1998.
5. P. Badin et al., "Visual Articulatory Feedback for Phonetic Correction in Second Language Learning," Proc. Workshop Second Language Learning Studies: Acquisition, Learning, Education and Technology (L2WS 10), Int'l Speech Comm. Assoc., 2010; lw10_P1-10.pdf.
6. M. Eskenazi, "An Overview of Spoken Language Technology for Education," Speech Comm., Oct. 2009, pp. 832-844.
7. K. Tokuda et al., "Speech Parameter Generation Algorithms for HMM-Based Speech Synthesis," Proc. 2000 IEEE Int'l Conf. Acoustics, Speech, and Signal Processing (ICASSP 00), IEEE, 2000, pp. 1315-1318.
8. S. Sako et al., "HMM-Based Text-to-Audio-Visual Speech Synthesis," Proc. 6th Int'l Conf. Spoken Language Processing (ICSLP 00), Int'l Speech Comm. Assoc., 2000, pp. 25-28.
9. L.J. Wang et al., "Synthesizing Visual Speech Trajectory with Minimum Generation Error," Proc. 2011 IEEE Int'l Conf. Acoustics, Speech, and Signal Processing (ICASSP 11), IEEE, 2011, pp. 4580-4583.
10. Z.-J. Yan, Y. Qian, and F.K. Soong, "Rich-Context Unit Selection (RUS) Approach to High Quality TTS," Proc. 2010 IEEE Int'l Conf. Acoustics, Speech, and Signal Processing (ICASSP 10), IEEE, 2010, pp. 4798-4801.
11. Y. Qian et al., "An HMM Trajectory Tiling (HTT) Approach to High Quality TTS," Proc. Blizzard Challenge 2010 Workshop, Language Technologies Inst., Carnegie Mellon Univ., 2010; .
12. L.J. Wang et al., "Synthesizing Photo-Real Talking Head via Trajectory-Guided Sample Selection," Proc. 11th Ann. Conf. Int'l Speech Comm. Assoc. (Interspeech 10), Int'l Speech Comm. Assoc., 2010, pp. 446-449.
13. B.-J. Theobald et al., "LIPS2008: Visual Speech Synthesis Challenge," Proc. 9th Ann. Conf. Int'l Speech Comm. Assoc. (Interspeech 08), Int'l Speech Comm. Assoc., 2008, pp. 2310-2313.
14. K. Kukich, "Techniques for Automatically Correcting Words in Text," ACM Computing Surveys, Dec. 1992, pp. 377-439.
15. B. Peng et al., "A New Phonetic Candidate Generator for Improving Search Query Efficiency," Proc. 12th Ann. Conf. Int'l Speech Comm. Assoc. (Interspeech 11), Int'l Speech Comm. Assoc., 2011, pp. 1117-1120.
16. D. Wang and S. King, "Letter-to-Sound Pronunciation Prediction Using Conditional Random Fields," IEEE Signal Processing Letters, Feb. 2011, pp. 122-125.
17. P. Liu and F. Soong, Kullback-Leibler Divergence Between Two Hidden Markov Models, tech. report, Microsoft Research Asia, 2005.
18. L.J. Wang, W. Han, and F.K. Soong, "High Quality Lip-Sync Animation for 3D Photo-Realistic Talking Head," Proc. 2012 IEEE Int'l Conf. Acoustics, Speech, and Signal Processing (ICASSP 12), IEEE, 2012, pp. 4529-4532.
19. Y. Qian, J. Xu, and F.K. Soong, "A Frame Mapping Based HMM Approach to Cross-Lingual Voice Transformation," Proc. 2011 IEEE Int'l Conf. Acoustics, Speech, and Signal Processing (ICASSP 11), IEEE, 2011, pp. 4568-4571.

Index Terms:
Engkoo, computing in Asia, computer-assisted language learning, talking head, text-to-speech synthesis, phonetic similarity search
Lijuan Wang, Yao Qian, Matthew Scott, Gang Chen, Frank Soong, "Computer-Assisted Audiovisual Language Learning," Computer, vol. 45, no. 6, pp. 38-47, June 2012, doi:10.1109/MC.2012.152
Usage of this product signifies your acceptance of the Terms of Use.