loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Eighth International Conference on Document Analysis and Recognition (ICDAR'05)
An Approach for Stemming in Symbolically Compressed Indian Language Imaged Documents
Seoul, Korea
August 31-September 01
ISBN: 0-7695-2420-6
Utpal Garain, Indian Statistical Institute, India
Alok Kumar Datta, Indian Statistical Institute, India
Stemming is used in many information retrieval (IR) systems to reduce variant word forms to common roots, and thereby improving the overall retrieval efficiency. This paper presents an algorithm for stemming in the context of document image retrieval system. The algorithm assumes that the documents are symbolically compressed and stemming has been attempted in the compressed domain itself. Experiments have been conducted on Indian language imaged documents for which efficient OCR still remains a challenging task. Results obtained from a set 150 document images (in Bangla script, the second most popular script in the Indian sub-continent) consisting of about 12K word show a promising performance of the proposed approach.
Citation:
Utpal Garain, Alok Kumar Datta, "An Approach for Stemming in Symbolically Compressed Indian Language Imaged Documents," icdar, pp.1080-1084, Eighth International Conference on Document Analysis and Recognition (ICDAR'05), 2005
Usage of this product signifies your acceptance of the Terms of Use.