The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.11 - November (2008 vol.30)
pp: 2040-2046
Jeroen F. Lichtenauer , Delft University of Technology, Delft
Emile A. Hendriks , Delft University of Technology, Delft
Marcel J.T. Reinders , Delft University of Technology, Delft
ABSTRACT
To recognize speech, handwriting or sign language, many hybrid approaches have been proposed that combine Dynamic Time Warping (DTW) or Hidden Markov Models (HMM) with discriminative classifiers. However, all methods rely directly on the likelihood models of DTW/HMM. We hypothesize that time warping and classification should be separated because of conflicting likelihood modelling demands. To overcome these restrictions, we propose to use Statistical DTW (SDTW) only for time warping, while classifying the warped features with a different method. Two novel statistical classifiers are proposed (CDFD and Q-DFFM), both using a selection of discriminative features (DF), and are shown to outperform HMM and SDTW. However, we have found that combining likelihoods of multiple models in a second classification stage degrades performance of the proposed classifiers, while improving performance with HMM and SDTW. A proof-of-concept experiment, combining DFFM mappings of multiple SDTW models with SDTW likelihoods, shows that also for model-combining, hybrid classification can provide significant improvement over SDTW. Although recognition is mainly based on 3D hand motion features, these results can be expected to generalize to recognition with more detailed measurements such as hand/body pose and facial expression.
INDEX TERMS
Time series analysis, Face and gesture recognition, Markov processes, Classifier design and evaluation, Real-time systems, 3D/stereo scene analysis, Vision and Scene Understanding, Artificial Intelligence, Computing Methodology
CITATION
Jeroen F. Lichtenauer, Emile A. Hendriks, Marcel J.T. Reinders, "Sign Language Recognition by Combining Statistical DTW and Independent Classification", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.30, no. 11, pp. 2040-2046, November 2008, doi:10.1109/TPAMI.2008.123
REFERENCES
[1] H. Sakoe and S. Chiba, “A Dynamic Programming Approach to Continuous Speech Recognition,” Proc. Seventh Int'l Congress on Acoustics (ICA '71), vol. 3, pp. 65-69, 1971.
[2] G. White and R. Neely, “Speech Recognition Experiments with Linear Prediction, Bandpass Filtering, and Dynamic Programming,” Proc. Int'l Conf. Acoustics, Speech and Signal Processing (ICASSP '76), pp. 183-188, 1976.
[3] C. Myers and L. Rabiner, “A Comparative Study of Several Dynamic Time-Warping Algorithms for Connected Word Recognition,” The Bell System Technical J., vol. 60, no. 7, pp. 1389-1409, Sept. 1981.
[4] J. di Martino, “Dynamic Time Warping Algorithms for Isolated and Connected Word Recognition,” New Systems and Architectures for Automatic Speech Recognition and Synthesis, pp.405-418, Springer-Verlag, 1985.
[5] C. Bahlmann and H. Burkhardt, “The Writer Independent Online Handwriting Recognition System Frog on Hand and Cluster Generative Statistical Dynamic Time Warping,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 26, no. 3, pp. 299-310, Mar. 2004.
[6] T. Starner, “Visual Recognition of American Sign Language Using Hidden Markov Models,” master's thesis, Massachusetts Inst. of Technology, Media Arts and Sciences, Jan. 1995.
[7] D. Gavrila and L. Davis, “Towards 3-D Model-Based Tracking and Recognition of Human Movement: A Multi-View Approach,” Proc. IEEE Int'l Workshop Face and Gesture Recognition (FG '95), pp.272-277, June 1995.
[8] A. Corradini, “Dynamic Time Warping for Off-Line Recognition of a Small Gesture Vocabulary,” Proc. IEEE ICCV Workshop Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems (RATFG-RTS '01), pp. 82-89, July 2001.
[9] S. Yang and R. Sarkar, “Gesture Recognition Using Hidden Markov Models from Fragmented Observations,” Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR '06), pp. 766-773, 2006.
[10] W. Stokoe, “Sign Language Structure: An Outline of the Visual Communication System of the American Deaf,” Studies in Linguistics: Occasional Papers, vol. 8, 1960.
[11] W. Stokoe, “Sign Language Structure: An Outline of the Visual Communication Systems of the American Deaf,” J. Deaf Studies and Deaf Education, vol. 10, no. 1, pp. 3-37, 2005.
[12] E. van der Kooij, “Phonological Categories in Sign Language of the Netherlands. The Role of Phonetic Implementation and Iconicity,” PhD dissertation, Leiden Univ./LOT, 2002.
[13] G. ten Holt, A. Koenderink, H. de Ridder, E. Hendriks, and M. Reinders, “How Much of a Sign Do We Really Need? Recognising Parts of Sign Language Signs,” Theoretical Issues in Sign Language Research 9, Dec. 2006.
[14] L. Ding and A. Martinez, “Recovering the Linguistic Components of the Manual Signs in American Sign Language,” Proc. IEEE Conf. Advanced Video and Signal Based Surveillance (AVSS '07), pp. 447-452, 2007.
[15] N. Morgan and H. Bourlard, “Continuous Speech Recognition Using Multilayer Perceptrons with Hidden Markov Models,” Proc. Int'l Conf. Acoustics, Speech and Signal Processing (ICASSP '90), pp. 413-416, 1990.
[16] Y. Matsuura, H. Miyazawa, and T. Skinner, “Word Recognition Using a Neural Network and a Phonetically Based DTW,” Proc. IEEE Int'l Workshop Neural Networks for Signal Processing (NNSP '94), pp. 329-334, Sept. 1994.
[17] A. Corradini and H. Gross, “Camera-Based Gesture Recognition for Robot Control,” Proc. Int'l Joint Conf. Neural Networks (IJCNN '00), vol. 4, pp. 133-138, July 2000.
[18] J. Ye, H. Yao, and F. Jiang, “Based on HMM and SVM Multilayer Architecture Classifier for Chinese Sign Language Recognition with Large Vocabulary,” Proc. Third Int'l Conf. Image and Graphics (ICIG '04), pp. 377-380, Dec. 2004.
[19] O. Aran and L. Akarun, “Recognizing Two Handed Gestures with Generative, Discriminative and Ensemble Methods via Fisher Kernels,” Proc. Int'l Workshop Multimedia Content Representation, Classification and Security (MCRCS '06), vol. 4105, pp. 159-166, Sept. 2006.
[20] C. Bahlmann, B. Haasdonk, and H. Burkhardt, “Online Handwriting Recognition with Support Vector Machines—A Kernel Approach,” Proc. Eighth Int'l Workshop Frontiers in Handwriting Recognition (IWFHR '02), pp.49-54, 2002.
[21] J. Alon, V. Athitsos, Q. Yuan, and S. Sclaroff, Simultaneous Localization and Recognition of Dynamic Hand Gestures, vol. 2, pp.254-260, 2005.
[22] R. Yang, S. Sarkar, and B. Loeding, “Enhanced Level Building Algorithm for the Movement Epenthesis Problem in Sign Language Recognition,” Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR '07), pp. 1-8, 2007.
[23] J. Lichtenauer, E. Hendriks, and M. Reinders, “Sign Language Recognition by Combining Statistical DTW and Independent Classification,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 30, no. 11, Nov. 2008, supplemental material, http://doi.ieeecomputersociety.org/10.1109 TPAMI.2008.123.
[24] R. Fisher, “The Use of Multiple Measurements in Taxonomic Problems,” Annals of Eugenics, vol. 7, no. 2, pp. 179-188, 1936.
[25] O. Hamsici and A. Martinez, “Bayes Optimality in Linear Discriminant Analysis,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 30, no. 4, pp. 647-657, Apr. 2008.
[26] U. von Agris, D. Schneider, J. Zieren, and K.-F. Kraiss, “Rapid Signer Adaptation for Isolated Sign Language Recognition,” Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR '06), June 2006.
21 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool