This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
2011 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies
Phoneme Based Representation for Vietnamese Web Page Classification
Lyon France
August 22-August 27
ISBN: 978-0-7695-4513-4
This paper proposes a novel text representation for Web pages written in Vietnamese. This representation is based on an analysis of Vietnamese documents at phonetic level in which each document will be represented as a bag of phonemes. It is designed to capture sound-based information in documents and to be helpful for resolving some non-topic text classification problems including automatic Vietnamese language identification of a document, ancient Vietnamese document detection, author identification, and poem identification. We apply some typical machine learning methods including NB, KNN and SVMs to build text classifiers. The experimental results show a significant improvement in terms of effectiveness and efficiency compared to the traditional syllable based representation in most cases.
Index Terms:
Document representation, Classification
Citation:
Giang-Son Nguyen, Xiaoying Gao, Peter Andreae, "Phoneme Based Representation for Vietnamese Web Page Classification," wi-iat, vol. 1, pp.15-22, 2011 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies, 2011
Usage of this product signifies your acceptance of the Terms of Use.