This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
HMM-Based Lexicon-Driven and Lexicon-Free Word Recognition for Online Handwritten Indic Scripts
April 2012 (vol. 34 no. 4)
pp. 670-682
S. Madhvanath, Genesys Telecom Labs., Chennai, India
A. Bharath, Hewlett-Packard Labs., Bangalore, India
Research for recognizing online handwritten words in Indic scripts is at its early stages when compared to Latin and Oriental scripts. In this paper, we address this problem specifically for two major Indic scripts-Devanagari and Tamil. In contrast to previous approaches, the techniques we propose are largely data driven and script independent. We propose two different techniques for word recognition based on Hidden Markov Models (HMM): lexicon driven and lexicon free. The lexicon-driven technique models each word in the lexicon as a sequence of symbol HMMs according to a standard symbol writing order derived from the phonetic representation. The lexicon-free technique uses a novel Bag-of-Symbols representation of the handwritten word that is independent of symbol order and allows rapid pruning of the lexicon. On handwritten Devanagari word samples featuring both standard and nonstandard symbol writing orders, a combination of lexicon-driven and lexicon-free recognizers significantly outperforms either of them used in isolation. In contrast, most Tamil word samples feature the standard symbol order, and the lexicon-driven recognizer outperforms the lexicon free one as well as their combination. The best recognition accuracies obtained for 20,000 word lexicons are 87.13 percent for Devanagari when the two recognizers are combined, and 91.8 percent for Tamil using the lexicon-driven technique.

[1] F. Coulmas, The Blackwell Encyclopedia of Writing Systems. Blackwell, 1996.
[2] Google Indic Transliteration, http://www.google.co.in/ transliterate indic, 2011.
[3] R. Balaji, V. Deepu, S. Madhvanath, and J. Prabhakaran, "Handwritten Gesture Recognition for Gesture Keyboard," Proc. 10th Int'l Workshop Frontiers in Handwriting Recognition, Oct. 2006.
[4] N.K. Srinivas, N. Varghese, and R.K.V.S. Raman, "IndicDasher: A Stroke and Gesture Based Input Mechanism for Indic Scripts," Proc. Workshop Intelligent User Interfaces for Developing Regions, Jan. 2008.
[5] A. Bharath and S. Madhvanath, "FreePad: A Novel Handwriting-Based Text Input for Pen and Touch Interfaces," Proc. 13th Int'l Conf. Intelligent User Interfaces, pp. 297-300, Jan. 2008.
[6] J.A. Pittman, "Handwriting Recognition: Tablet PC Text Input," Computer, vol. 40, no. 9, pp. 49-54, Aug. 2007.
[7] A. Bharath and S. Madhvanath, "Online Handwriting Recognition for Indic Scripts," Guide to OCR for Indic Scripts: Document Recognition and Retrieval, V. Govindaraju and S. Setlur, eds., pp. 209-234, Springer, 2009.
[8] U. Pal, T. Wakabayashi, and F. Kimura, "Comparative Study of Devnagari Handwritten Character Recognition Using Different Feature and Classifiers," Proc. 10th Int'l Conf. Document Analysis and Recognition, pp. 1111-1115, July 2009.
[9] B. Shaw, S.K. Parui, and M. Shridhar, "Offline Handwritten Devanagari Word Recognition: A Segmentation Based Approach," Proc. 19th Int'l Conf. Pattern Recognition, pp. 1-4, Dec. 2008.
[10] H. Swethalakshmi, "Online Handwritten Character Recognition for Devanagari and Tamil Scripts Using Support Vector Machines," master's thesis, Indian Inst. of Tech nology, Oct. 2007.
[11] U. Bhattacharya, A. Nigam, Y.S. Rawat, and S.K. Parui, "An Analytic Scheme for Online Handwritten Bangla Cursive Word Recognition," Proc. 11th Int'l Conf. Frontiers in Handwriting Recognition, pp. 320-325, Aug. 2008.
[12] A. Sharma, R. Kumar, and R.K. Sharma, "Rearrangement of Recognized Strokes in Online Handwritten Gurmukhi Words Recognition," Proc. 10th Int'l Conf. Document Analysis and Recognition, pp. 1241-1245, July 2009.
[13] J. Hu, S.G. Lim, and M.K. Brown, "Writer Independent On-Line Handwriting Recognition Using an HMM Approach," Pattern Recognition, vol. 33, no. 1, pp. 133-147, Jan. 2000.
[14] M. Nakai, N. Akira, H. Shimodaira, and S. Sagayama, "Substroke Approach to HMM-Based On-Line Kanji Handwriting Recognition," Proc. Sixth Int'l Conf. Document Analysis and Recognition, pp. 491-495, Sept. 2001.
[15] H.S.M. Nakai and S. Sagayama, "Generation of Hierarchical Dictionary for Stroke-Order Free Kanji Handwriting Recognition Based on Substroke HMM," Proc. Seventh Int'l Conf. Document Analysis and Recognition, pp. 514-518, Aug. 2003.
[16] S.-C. Oh, J.-Y. Ha, and J.H. Kim, "Context Dependent Search in Interconnected Hidden Markov Model for Unconstrained Handwriting Recognition," Pattern Recognition, vol. 28, pp. 163-1704, 1995.
[17] A. Bharath and S. Madhvanath, "Hidden Markov Models for Online Handwritten Tamil Word Recognition," Proc. Ninth Int'l Conf. Document Analysis and Recognition, pp. 506-510, Sept. 2007.
[18] A.S. Bhaskarabhatla and S. Madhvanath, "Experiences in Collection of Handwriting Data for Online Handwriting Recognition in Indic Scripts," Proc. Fourth Int'l Conf. Language Resources and Evaluation, pp. 2223-2226, May 2004.
[19] The EMILLE/CIIL Corpus, http://www.elda.org/catalogue/en/textW0037.html , 2011.
[20] B. Kalika, A.G. Ramakrishnan, P.P. Talukdar, and N.S. Krishna, "Tools for the Development of a Hindi Speech Synthesis System," Proc. Fifth ISCA Speech Synthesis Workshop, pp. 109-114, June 2004.
[21] ACECAD DigiMemo A402, http://www.acecad.com.twdma402. html, 2011.
[22] I. Guyon, L. Schomaker, R. Plamondon, M. Liberman, and S. Janet, "UNIPEN Project of Online Data Exchange and Recognizer Benchmarks," Proc. Int'l Conf. Pattern Recognition, pp. 29-33, Oct. 1994.
[23] A. Namboodiri and A.K. Jain, "Online Handwritten Script Recognition," IEEE Trans. Pattern Analysis Machine Intelligence, vol. 26, no. 1, pp. 124-130, Jan. 2004.
[24] N. Joshi, G. Sita, A.G. Ramakrishnan, V. Deepu, and S. Madhvanath, "Machine Recognition of Online Handwritten Devanagari Characters," Proc. Eighth Int'l Conf. Document Analysis and Recognition, pp. 1156-1160, Aug.-Sept. 2005.
[25] S. Jaeger, S. Manke, J. Reichert, and A. Waibel, "Online Handwriting Recognition: The NPen++ Recognizer," Int'l J. Document Analysis and Recognition, vol. 3, no. 3, pp. 169-180, Mar. 2001.
[26] S. Connell, "Online Handwriting Recognition Using Multiple Pattern Class Models," PhD dissertation, Michigan State Univ., May 2000.
[27] A. Bharath and S. Madhvanath, "A Framework Based on Semi-Supervised Clustering for Discovering Unique Writing Styles," Proc. 10th Int'l Conf. Document Analysis and Recognition, pp. 891-895, July 2009.
[28] L.R. Rabiner, "A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition," Proc. IEEE, vol. 77, no. 2, pp. 257-286, Feb. 1989.
[29] J. Vaida and A. Guptab, "Exploring Word Recognition in a Semi-Alphabetic Script: The Case of Devanagari," Brain and Language, vol. 81, nos. 1-3, pp. 679-690, Apr. 2002.
[30] M.-P. Schambach, "Recurrent HMMs and Cursive Handwriting Recognition Graphs," Proc. 10th Int'l Conf. Document Analysis and Recognition, pp. 1146-1150, July 2009.
[31] G. Salton and M. McGill, Introduction to Modern Information Retrieval. McGraw-Hill, 1983.
[32] J. Sivic and A. Zisserman, "Video Google: A Text Retrieval Approach to Object Matching in Videos," Proc. Ninth IEEE Int'l Conf. Computer Vision, pp. 1470-1477, Oct. 2003.
[33] L. Fei-fei, R. Fergus, and A. Torralba, "Recognizing and Learning Object Categories," http://people.csail.mit.edu/torralba/ shortCourseRLOC index.html, 2011.
[34] R.O. Duda, P.E. Hart, and D.G. Stork, Pattern Classification. Wiley, 2001.
[35] T.K. Ho, J.J. Hull, and S.N. Srihari, "Decision Combination in Multiple Classifier Systems," IEEE Trans. Pattern Analysis Machine Intelligence, vol. 16, no. 1, pp. 66-75, Jan. 1994.

Index Terms:
natural language processing,handwritten character recognition,hidden Markov models,image representation,Tamil word samples,HMM based lexicon driven word recognition,lexicon free word recognition,online handwritten Indic scripts,Latin scripts,oriental scripts,hidden Markov models,phonetic representation,bag-of-symbols representation,handwritten Devanagari word,symbol writing orders,Writing,Hidden Markov models,Feature extraction,Handwriting recognition,Character recognition,Shape,Ink,Tamil.,Online handwriting recognition,word recognition,lexicon driven,lexicon free,bag of symbols,symbol order variation,Devanagari
Citation:
S. Madhvanath, A. Bharath, "HMM-Based Lexicon-Driven and Lexicon-Free Word Recognition for Online Handwritten Indic Scripts," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 4, pp. 670-682, April 2012, doi:10.1109/TPAMI.2011.234
Usage of this product signifies your acceptance of the Terms of Use.