Character Code Conversion and Misspelled Word Processing in Uyghur, Kazak, Kyrgyz Multilingual Information Retrieval System
Advanced Language Processing and Web Information Technology, International Conference on (2008)
July 23, 2008 to July 25, 2008
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ALPIT.2008.95
The spelling errors often occur in the web pages or in the user query phrases, and the non-Unicode character coding scheme used by some of the Uyghur, Kazak, and Kyrgyz language based websites have a serious impact on recall and accuracy of Uyghur, Kazak, and Kyrgyz information retrieval system (UKKIRS). In this paper, studied and proposed the most effective solutions and ideas for above actual problems: in view of the problem of character coding varieties, proposed a character code conversion method from the non-Unicode to Unicode; For spelling errors, proposed a reconstruction and a root-expansion method based on user query phrases. The experimental results indicated that, the proposed algorithms solved well the problems mentioned above, and are very dedicated to this UKKIRS.
Character coding, Code conversion, Root expansion, Candidate Suggestion
T. Tohti, W. Musajan and A. Hamdulla, "Character Code Conversion and Misspelled Word Processing in Uyghur, Kazak, Kyrgyz Multilingual Information Retrieval System," Advanced Language Processing and Web Information Technology, International Conference on(ALPIT), vol. 00, no. , pp. 139-144, 2008.