The Community for Technology Leaders
2012 International Conference on Advanced Computer Science Applications and Technologies (ACSAT) (2012)
Kuala Lumpur
Nov. 26, 2012 to Nov. 28, 2012
ISBN: 978-1-4673-5832-3
pp: 344-348
ABSTRACT
Natural language processing is an emerging technology in linguistic industry and an aid to human-computer interaction in computer science. Language identification, on the other hand, is a form of pattern recognition that helps to identify predefined language of a web page and to predict the unknown language of one particular text. Written texts are constructed by common features such as character, word and n-gram and these characteristics are unique among languages. From the experiment result, the performance of the supervised n-gram produces an accurate identification value and outperforms the distance measurement on Arabic script web pages.
INDEX TERMS
natural language processing, support vector machines, text analysis, Web sites
CITATION

C. Ng, S. Liew, W. M. Hussin and T. Herawan, "Identifying the Dominant Language of Web Page Using Supervised N-grams," 2012 International Conference on Advanced Computer Science Applications and Technologies (ACSAT), Kuala Lumpur, 2013, pp. 344-348.
doi:10.1109/ACSAT.2012.74
82 ms
(Ver 3.3 (11022016))