loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Language Engineering Conference (LEC'02)
Towards Indian Language Spell-checker Design
Hyderabad, India
December 13-December 15
ISBN: 0-7695-1885-0
Bidyut Baran Chaudhuri, Indian Statistical Institute
This paper deals with the development of spell-checker in Indian Languages with an example in Bangla, the second most popular language in Indian Subcontinent. A brief review of problems and current scenario of Indian language spell-checkers is described. Then the approach on Bangla spell-checker is elaborated. In this approach the technique works in two stages. The first stage takes care of phonetic similarity error. For that the phonetically similar characters are mapped into single units of character code. A new dictionary Dc is constructed with this reduced set of alphabet. A phonetically similar but wrongly spelt word can be easily corrected using this dictionary. The second stage takes care of errors other than phonetic similarity. Here wrongly spelt word S of n characters is searched in the dictionary Dcc. If S is a nonword, its first n characters will match with a valid word in Dc. (if k1=n then the word in Dc must be longer than n). A reversed word dictionary Dr is also generated where the characters of the word are maintained in a reversed order. If the last k2 characters of S match with a word in Dr then, for single error, it is located within the intersection region of first k1+1 and last k2 +1 characters of S. We observed that this region is very small compared to word length for most cases and the number of suggested correct words can be drastically reduced using this information. We have used our approach in correcting Bangla text, where the problem of inflection is tackled by a simplified version of morphological analyser. Another problem encountered in Indian languages is the existence of large number of compound words formed by Euphony and Assimilation. The problem of compound words is also carefully tackled.
Citation:
Bidyut Baran Chaudhuri, "Towards Indian Language Spell-checker Design," lec, pp.139, Language Engineering Conference (LEC'02), 2002
Usage of this product signifies your acceptance of the Terms of Use.