2012 International Conference on Advanced Computer Science Applications and Technologies (ACSAT) (2012)
Nov. 26, 2012 to Nov. 28, 2012
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ACSAT.2012.74
Natural language processing is an emerging technology in linguistic industry and an aid to human-computer interaction in computer science. Language identification, on the other hand, is a form of pattern recognition that helps to identify predefined language of a web page and to predict the unknown language of one particular text. Written texts are constructed by common features such as character, word and n-gram and these characteristics are unique among languages. From the experiment result, the performance of the supervised n-gram produces an accurate identification value and outperforms the distance measurement on Arabic script web pages.
natural language processing, support vector machines, text analysis, Web sites
C. Ng, S. Liew, W. M. Hussin and T. Herawan, "Identifying the Dominant Language of Web Page Using Supervised N-grams," 2012 International Conference on Advanced Computer Science Applications and Technologies (ACSAT), Kuala Lumpur, 2013, pp. 344-348.