• Publication
  • 2010
  • Issue No. 3 - March
  • Abstract - Handling Movement Epenthesis and Hand Segmentation Ambiguities in Continuous Sign Language Recognition Using Nested Dynamic Programming
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Handling Movement Epenthesis and Hand Segmentation Ambiguities in Continuous Sign Language Recognition Using Nested Dynamic Programming
March 2010 (vol. 32 no. 3)
pp. 462-477
Ruiduo Yang, University of South Florida, Tampa
Sudeep Sarkar, University of South Florida, Tampa
Barbara Loeding, University of South Florida Polytechnic, Lakeland
We consider two crucial problems in continuous sign language recognition from unaided video sequences. At the sentence level, we consider the movement epenthesis (me) problem and at the feature level, we consider the problem of hand segmentation and grouping. We construct a framework that can handle both of these problems based on an enhanced, nested version of the dynamic programming approach. To address movement epenthesis, a dynamic programming (DP) process employs a virtual me option that does not need explicit models. We call this the enhanced level building (eLB) algorithm. This formulation also allows the incorporation of grammar models. Nested within this eLB is another DP that handles the problem of selecting among multiple hand candidates. We demonstrate our ideas on four American Sign Language data sets with simple background, with the signer wearing short sleeves, with complex background, and across signers. We compared the performance with Conditional Random Fields (CRF) and Latent Dynamic-CRF-based approaches. The experiments show more than 40 percent improvement over CRF or LDCRF approaches in terms of the frame labeling rate. We show the flexibility of our approach when handling a changing context. We also find a 70 percent improvement in sign recognition rate over the unenhanced DP matching algorithm that does not accommodate the me effect.

[1] C. Sylvie and S. Ranganath, “Automatic Sign Language Analysis: A Survey and the Future Beyond Lexical Meaning,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 27, no. 6, pp. 873-891, June 2005.
[2] B. Loeding, S. Sarkar, A. Parashar, and A. Karshmer, “Progress in Automated Computer Recognition of Sign Language,” Lecture Notes in Computer Science, vol. 3118, pp. 1079-1087, Springer, 2004.
[3] L. Rabiner, “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,” Proc. IEEE, vol. 77, no. 2, pp.257-286, Feb. 1989.
[4] C. Myers and L. Rabiner, “A Level Building Dynamic Time Warping Algorithm for Connected Word Recognition,” IEEE Trans. Acoustics, Speech, and Signal Processing, vol. 29, no. 2, pp.284-297, Apr. 1981.
[5] J. Lichtenauer, E. Hendriks, and M. Reinders, “Sign Language Recognition by Combining Statistical DTW and Independent Classification,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 30, no. 11, pp. 2040-2046, Nov. 2008.
[6] M. Skounakis, M. Craven, and S. Ray, “Hierarchical Hidden Markov Models for Information Extraction,” Proc. Int'l Joint Conf. Artificial Intelligence, 2003.
[7] C. Valli and C. Lucas, Linguistics of American Sign Language: A Resource Text for ASL Users. Gallaudet Univ. Press, 1992.
[8] C. Vogler and D. Metaxas, “A Framework of Recognizing the Simultaneous Aspects of American Sign Language,” Computer Vision and Image Understanding, vol. 81, no. 81, pp. 358-384, 2001.
[9] C. Vogler and D. Metaxas, “ASL Recognition Based on a Coupling between HMMs and 3D Motion Analysis,” Proc. Int'l Conf. Computer Vision, pp. 363-369, 1998.
[10] Q. Yuan, W. Gao, H. Yao, and C. Wang, “Recognition of Strong and Weak Connection Models in Continuous Sign Language,” Proc. Int'l Conf. Pattern Recognition, vol. 1, pp. 75-78, 2002.
[11] W. Gao, G. Fang, D. Zhao, and Y. Chen, “Transition Movement Models for Large Vocabulary Continuous Sign Language Recognition,” Proc. IEEE Int'l Conf. Automatic Face and Gesture Recognition, pp. 553-558, 2004.
[12] R. Yang and S. Sarkar, “Detecting Coarticulation in Sign Language Using Conditional Random Fields,” Proc. Int'l Conf. Pattern Recognition, pp. 108-112, 2006.
[13] R. Yang, S. Sarkar, and B.L. Loeding, “Enhanced Level Building Algorithm for the Movement Epenthesis Problem in Sign Language Recognition,” Proc. IEEE Int'l Conf. Computer Vision and Pattern Recognition, pp. 1-8, 2007.
[14] L. Rabiner and B. Juang, Fundamentals of Speech Recognition. PTR Prentice Hall, 1993.
[15] C. Vogler and D. Metaxas, “Handshapes and Movements: Multiple-Channel ASL Recognition,” Lecture Notes in Artificial Intelligence, vol. 2915, pp. 247-258, Springer, 2004.
[16] C. Vogler, H. Sun, and D. Metaxas, “A Framework for Motion Recognition with Application to American Sign Language and Gait Recognition,” Proc. Workshop Human Motion, pp. 33-38, 2000.
[17] C. Wang, W. Gao, and S. Shan, “An Approach Based on Phonemes to Large Vocabulary Chinese Sign Language Recognition,” Proc. IEEE Int'l Conf. Automatic Face and Gesture Recognition, pp. 393-398, 2002.
[18] H. Brashear, V. Henderson, K.-H. Park, H. Hamilton, S. Lee, and T. Starner, “American Sign Language Recognition in Game Development for Deaf Children,” Proc. Int'l ACM SIGACCESS Conf. Computers and Accessibility, pp. 79-86, 2006.
[19] T. Starner and A. Pentland, “Visual Recognition of American Sign Language Using Hidden Markov Models,” Proc. IEEE Int'l Conf. Automatic Face and Gesture Recognition, pp. 189-194, 1995.
[20] M. Kadous, “Machine Translation of AUSLAN Signs Using Powergloves: Towards Large Lexicon-Recognition of Sign Language,” Proc. Workshop the Integration of Gesture in Language and Speech, pp. 165-174, 1996.
[21] B. Bauer, H. Hienz, and K.-F. Kraiss, “Video-Based Continuous Sign Language Recognition Using Statistical Methods,” Proc. Int'l Conf. Pattern Recognition, vol. 2, pp. 2463-2466, 2000.
[22] B. Bauer and K.-F. Kraiss, “Video-Based Sign Recognition Using Self-Organizing Subunits,” Proc. Int'l Conf. Pattern Recognition, vol. 2, pp. 434-437, 2002.
[23] Y. Cui and J. Weng, “Appearance-Based Hand Sign Recognition from Intensity Image Sequences,” Computer Vision and Image Understanding, vol. 78, no. 2, pp. 157-176, 2000.
[24] L. Ding and A. Martinez, “Recovering the Linguistic Components of the Manual Signs in American Sign Language,” Proc. IEEE Conf. Advanced Video and Signal-Based Surveillance, 2007.
[25] Y. Sato and T. Kobayashi, “Extension of Hidden Markov Models to Deal with Multiple Candidates of Observations and Its Application to Mobile-Robot-Oriented Gesture Recognition,” Proc. Int'l Conf. Pattern Recognition, pp. 515-519, 2002.
[26] J. Alon, V. Athitsos, Q. Yuan, and S. Sclaroff, “Simultaneous Localization and Recognition of Dynamic Hand Gestures,” Proc. IEEE Workshop Motion and Video Computing, vol. 2, pp. 254-260, 2005.
[27] R. Yang and S. Sarkar, “Gesture Recognition Using Hidden Markov Model from Fragmented Observations,” Proc. IEEE Int'l Conf. Computer Vision and Pattern Recognition, vol. 1, pp. 766-773, 2006.
[28] R. Yang and S. Sarkar, “Coupled Grouping and Matching for Sign and Gesture Recognition,” J. Computer Vision and Image Understanding, vol. 113, no. 6, pp. 663-681, 2009.
[29] R. Wilbur and A. Kak, “Purdue RVL-SLLL American Sign Language Database,” Technical Report 06-12, School of Electrical and Computer Eng., Purdue Univ., http://RVL.ecn.purdue.edu/databaseWilbur_Kak.html , 2006.
[30] H. Silverman and D. Morgan, “The Application of Dynamic Programming to Connected Speech Recognition,” IEEE Acoustics, Speech, and Signal Processing Magazine, vol. 26, no. 6, pp. 575-582, July 1990.
[31] M. Jones and J. Rehg, “Statistical Color Models with Application to Skin Detection,” Int'l J. Computer Vision, vol. 46, no. 1, pp. 81-96, 2002.
[32] I. Robledo and S. Sarkar, “Statistical Motion Model Based on the Change of Feature Relationships: Human Gait-Based Recognition,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 25, no. 10, pp. 1323-1328, Oct. 2003.
[33] C. Sminchisescu, A. Kanaujia, and D. Metaxas, “Conditional Models for Contextual Human Motion Recognition,” Computer Vision and Image Understanding, vol. 104, no. 2, pp. 210-220, 2006.
[34] L. Morency, A. Quattoni, and T. Darrell, “Latent-Dynamic Discriminative Models for Continuous Gesture Recognition,” Proc. IEEE Int'l Conf. Computer Vision and Pattern Recognition, pp.1-8, 2007.
[35] A.M. Martinez, R.R. Wilbur, R. Shay, and A. Kak, “Purdue RVL-SLLL ASL Database for Automatic Recognition of American Sign Language,” Proc. Int'l Conf. Multimodal Interfaces, 2002.
[36] R. Yang, S. Sarkar, B.L. Loeding, and A.I. Karshmer, “Efficient Generation of Large Amount of Training Data for Sign Language Recognition: A Semi-Automatic Tool,” Proc. Conf. Computers Helping People with Special Needs, pp. 635-642, 2006.
[37] H. Brashear, T. Starner, P. Lukowicz, and H. Junker, “Using Multiple Sensors for Mobile Sign Language Recognition,” Proc. IEEE Int'l Symp. Wearable Computers, pp. 45-52, 2003.
[38] V. Levenshtein, “Binary Codes Capable of Correcting Deletions, Insertions and Reversals,” Doklady Akademii Nauk SSSR, vol. 163, no. 4, pp. 845-848, 1965.
[39] Open Source Computer Vision Library, http://www.intel.com/research/mrl/research opencv, 2008.

Index Terms:
Sign language, movement epenthesis, continuous gesture, segmentation, level building.
Citation:
Ruiduo Yang, Sudeep Sarkar, Barbara Loeding, "Handling Movement Epenthesis and Hand Segmentation Ambiguities in Continuous Sign Language Recognition Using Nested Dynamic Programming," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 32, no. 3, pp. 462-477, March 2010, doi:10.1109/TPAMI.2009.26
Usage of this product signifies your acceptance of the Terms of Use.