This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
2009 WRI World Congress on Computer Science and Information Engineering
A Cross-Lingual Word Kernel SVM for SMT Training Corpus Selection
Los Angeles, California USA
March 31-April 02
ISBN: 978-0-7695-3507-4
Instead of collecting more and more parallel training corpora, this paper aims to improve SMT performance by exploiting full potential of the existing parallel corpora. Inspired by the mechanism of string subsequence and word sequence kernels, we first propose a cross-lingual word kernel (CWK) SVM to classify SMT training corpus as literal translation and free translation, and then use these data to train SMT models. One experiment indicates that larger training corpus do not always lead to higher decoding performance when the incremental data are not literal translation. And another experiment shows that properly enlarging the contribution of literal translation can improve SMT performance significantly.
Index Terms:
Cross-lingual, Word Kernel SVM, SMT
Citation:
Xiwu Han, "A Cross-Lingual Word Kernel SVM for SMT Training Corpus Selection," csie, vol. 2, pp.626-630, 2009 WRI World Congress on Computer Science and Information Engineering, 2009
Usage of this product signifies your acceptance of the Terms of Use.