The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.06 - June (2012 vol.45)
pp: 38-47
Lijuan Wang , Microsoft Research Asia, Beijing
Yao Qian , Microsoft Research Asia, Beijing
Matthew Scott , Microsoft Research Asia, Beijing
Gang Chen , Microsoft Research Asia, Beijing
Frank Soong , Microsoft Research Asia, Beijing
ABSTRACT
Advances in speech-processing technology have enabled novel ways to learn a foreign language online. With Engkoo, researchers in China are working to turn any computer into a language learning assistant and make searching a language easier. The featured Web extra at http://youtu.be/_VHDMAKKLKo is a video discussion titled “Computer-Assisted Audiovisual Language Learning” that demonstrates Engkoo, a Web-based computer-assisted audiovisual language-learning service that combines two emerging speech processing technologies—talking head and phonetic similarity search.
INDEX TERMS
Engkoo, computing in Asia, computer-assisted language learning, talking head, text-to-speech synthesis, phonetic similarity search
CITATION
Lijuan Wang, Yao Qian, Matthew Scott, Gang Chen, Frank Soong, "Computer-Assisted Audiovisual Language Learning", Computer, vol.45, no. 6, pp. 38-47, June 2012, doi:10.1109/MC.2012.152
REFERENCES
1. B. Seidlhofer, "Common Ground and Different Realities: World Englishes and English as a Lingua Franca," World Englishes, June 2009, pp. 236-245.
2. L.J. Zhang, R. Rubdy, and L. Alsagoff, "Englishes and Literatures-in-English in a Globalised World," Proc. 13th Int'l Conf. English in Southeast Asia (ESEA 08), National Inst. of Education, Singapore, 2008, pp. 42-58.
3. M.R. Scott, X. Liu, and M. Zhou, "Towards a Specialized Search Engine for Language Learners," Proc. IEEE, Sept. 2011, pp. 1462-1465.
4. D.W. Massaro, Perceiving Talking Faces: From Speech Perception to a Behavioral Principle, MIT Press, 1998.
5. P. Badin et al., "Visual Articulatory Feedback for Phonetic Correction in Second Language Learning," Proc. Workshop Second Language Learning Studies: Acquisition, Learning, Education and Technology (L2WS 10), Int'l Speech Comm. Assoc., 2010; www.isca-speech.org/archive/L2WS_2010/papers lw10_P1-10.pdf.
6. M. Eskenazi, "An Overview of Spoken Language Technology for Education," Speech Comm., Oct. 2009, pp. 832-844.
7. K. Tokuda et al., "Speech Parameter Generation Algorithms for HMM-Based Speech Synthesis," Proc. 2000 IEEE Int'l Conf. Acoustics, Speech, and Signal Processing (ICASSP 00), IEEE, 2000, pp. 1315-1318.
8. S. Sako et al., "HMM-Based Text-to-Audio-Visual Speech Synthesis," Proc. 6th Int'l Conf. Spoken Language Processing (ICSLP 00), Int'l Speech Comm. Assoc., 2000, pp. 25-28.
9. L.J. Wang et al., "Synthesizing Visual Speech Trajectory with Minimum Generation Error," Proc. 2011 IEEE Int'l Conf. Acoustics, Speech, and Signal Processing (ICASSP 11), IEEE, 2011, pp. 4580-4583.
10. Z.-J. Yan, Y. Qian, and F.K. Soong, "Rich-Context Unit Selection (RUS) Approach to High Quality TTS," Proc. 2010 IEEE Int'l Conf. Acoustics, Speech, and Signal Processing (ICASSP 10), IEEE, 2010, pp. 4798-4801.
11. Y. Qian et al., "An HMM Trajectory Tiling (HTT) Approach to High Quality TTS," Proc. Blizzard Challenge 2010 Workshop, Language Technologies Inst., Carnegie Mellon Univ., 2010; http://festvox.org/blizzard/bc2010MSRA_%20Blizzard2010.pdf .
12. L.J. Wang et al., "Synthesizing Photo-Real Talking Head via Trajectory-Guided Sample Selection," Proc. 11th Ann. Conf. Int'l Speech Comm. Assoc. (Interspeech 10), Int'l Speech Comm. Assoc., 2010, pp. 446-449.
13. B.-J. Theobald et al., "LIPS2008: Visual Speech Synthesis Challenge," Proc. 9th Ann. Conf. Int'l Speech Comm. Assoc. (Interspeech 08), Int'l Speech Comm. Assoc., 2008, pp. 2310-2313.
14. K. Kukich, "Techniques for Automatically Correcting Words in Text," ACM Computing Surveys, Dec. 1992, pp. 377-439.
15. B. Peng et al., "A New Phonetic Candidate Generator for Improving Search Query Efficiency," Proc. 12th Ann. Conf. Int'l Speech Comm. Assoc. (Interspeech 11), Int'l Speech Comm. Assoc., 2011, pp. 1117-1120.
16. D. Wang and S. King, "Letter-to-Sound Pronunciation Prediction Using Conditional Random Fields," IEEE Signal Processing Letters, Feb. 2011, pp. 122-125.
17. P. Liu and F. Soong, Kullback-Leibler Divergence Between Two Hidden Markov Models, tech. report, Microsoft Research Asia, 2005.
18. L.J. Wang, W. Han, and F.K. Soong, "High Quality Lip-Sync Animation for 3D Photo-Realistic Talking Head," Proc. 2012 IEEE Int'l Conf. Acoustics, Speech, and Signal Processing (ICASSP 12), IEEE, 2012, pp. 4529-4532.
19. Y. Qian, J. Xu, and F.K. Soong, "A Frame Mapping Based HMM Approach to Cross-Lingual Voice Transformation," Proc. 2011 IEEE Int'l Conf. Acoustics, Speech, and Signal Processing (ICASSP 11), IEEE, 2011, pp. 4568-4571.
49 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool