The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.06 - June (2011 vol.33)
pp: 1175-1188
Bo Peng , Arizona State University, Tempe
ABSTRACT
This paper presents a robust framework for online full-body gesture spotting from visual hull data. Using view-invariant pose features as observations, hidden Markov models (HMMs) are trained for gesture spotting from continuous movement data streams. Two major contributions of this paper are 1) view-invariant pose feature extraction from visual hulls, and 2) a systematic approach to automatically detecting and modeling specific nongesture movement patterns and using their HMMs for outlier rejection in gesture spotting. The experimental results have shown the view-invariance property of the proposed pose features for both training poses and new poses unseen in training, as well as the efficacy of using specific nongesture models for outlier rejection. Using the IXMAS gesture data set, the proposed framework has been extensively tested and the gesture spotting results are superior to those reported on the same data set obtained using existing state-of-the-art gesture spotting methods.
INDEX TERMS
Online gesture spotting, view invariance, multilinear analysis, visual hull, hidden Markov models, nongesture models.
CITATION
Bo Peng, "Online Gesture Spotting from Visual Hull Data", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.33, no. 6, pp. 1175-1188, June 2011, doi:10.1109/TPAMI.2010.199
REFERENCES
[1] C. Cruz-Neira, D.J. Sandin, T.A. DeFanti, R.V. Kenyon, and J.C. Hart, “The Cave: Audio Visual Experience Automatic Virtual Environment,” Comm. ACM, vol. 35, no. 6, pp. 64-72, 1992.
[2] T. Starner, B. Leibe, D. Minnen, T. Westyn, A. Hurst, and J. Weeks, “The Perceptive Workbench: Computervision-Based Gesture Tracking, Object Tracking, and 3d Reconstruction of Augmented Desks,” Machine Vision and Applications, vol. 14, pp. 59-71, 2003.
[3] C. Keskin, K. Balci, O. Aran, B. Sankur, and L. Akarun, “A Multimodal 3D Healthcare Communication System,” Proc. 3DTV Conf., pp. 1-4, 2007.
[4] A. Camurri, B. Mazzarino, G. Volpe, P. Morasso, F. Priano, and C. Re, “Application of Multimedia Techniques in the Physical Rehabilitation of Parkinsons Patients,” J. Visualization and Computer Animation, vol. 14, pp. 269-278, 2003.
[5] H.S. Park, D.J. Jung, and H.J. Kim, “Vision-Based Game Interface Using Human Gesture,” Advances in Image and Video Technology, pp. 662-671, Springer, 2006.
[6] S.-W. Lee, “Automatic Gesture Recognition for Intelligent Human-Robot Interaction,” Proc. IEEE Int'l Conf. Automatic Face and Gesture Recognition, pp. 645-650, 2006.
[7] A. Camurri, S. Hashimoto, M. Ricchetti, A. Ricci, K. Suzuki, R. Trocca, and G. Volpe, “Eyesweb: Toward Gesture and Affect Recognition in Interactive Dance and Music Systems,” Computer Music J., vol. 24, no. 1, pp. 57-69, 2000.
[8] G. Qian, F. Guo, T. Ingalls, L. Olson, J. James, and T. Rikakis, “A Gesture-Driven Multimodal Interactive Dance System,” Proc. IEEE Int'l Conf. Multimedia and Expo, pp. 1579-1582, 2004.
[9] Y. Zhu and G. Xu, “A Real-Time Approach to the Spotting, Representation, and Recognition of Hand Gestures for Human-Computer Interaction,” Computer Vision and Image Understanding, vol. 85, pp. 189-208, 2002.
[10] H.-D. Yang, S. Sclaroff, and S.-W. Lee, “Sign Language Spotting with a Threshold Model Based on Conditional Random Fields,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 31, no. 7, pp. 1264-1277, July 2009.
[11] D. Weinland, R. Ronfard, and E. Boyer, “Free Viewpoint Action Recognition Using Motion History Volumes,” Computer Vision and Image Understanding, vol. 104, nos. 2/3, pp. 249-257, 2006.
[12] D. Weinland, E. Boyer, and R. Ronfard, “Action Recognition from Arbitrary Views Using 3D Exemplars,” Proc. IEEE Int'l Conf. Computer Vision, pp. 1-7, 2007.
[13] F. Lv and R. Nevatia, “Single View Human Action Recognition Using Key Pose Matching and Viterbi Path Searching,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2007.
[14] H. Francke, J.R. del Solar, and R. Verschae, “Real-Time Hand Gesture Detection Recognition Using Boosted Classifiers and Active Learning,” Advances in Image and Video Technology, pp. 533-547, Springer, 2007.
[15] G. Ye, J.J. Corso, D. Burschka, and G.D. Hager, “VICS: A Modular HCI Framework Using Spatiotemporal Dynamics,” Machine Vision and Applications, vol. 16, no. 1, pp. 13-20, 2004.
[16] G. Ye, J.J. Corso, and G.D. Hager, “Gesture Recognition Using 3D Appearance and Motion Features,” Proc. IEEE Conf. Computer Vision and Pattern Recognition , pp. 160-166, 2004.
[17] M. Holte and T. Moeslund, “View Invariant Gesture Recognition Using 3D Motion Primitives,” Proc. IEEE Int'l Conf. Acoustics, Speech, and Signal Processing, pp. 797-800, 2008.
[18] T. Kirishima, K. Sato, and K. Chihara, “Real-Time Gesture Recognition by Learning and Selective Control of Visual Interest Points,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 27, no. 3, pp. 351-364, Mar. 2005.
[19] S. Mitra and T. Acharya, “Gesture Recognition: A Survey,” IEEE Trans. Systems, Man, and Cybernetics, Part C: Applications and Rev., vol. 37, no. 3, pp. 311-324, May 2007.
[20] A. Bobick and Y. Ivanov, “Action Recognition Using Probabilistic Parsing,” Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, pp. 196-202, 1998.
[21] A. Yilmaz, “Recognizing Human Actions in Videos Acquired by Uncalibrated Moving Cameras,” Proc. IEEE Int'l Conf. Computer Vision, pp. 150-157, 2005.
[22] Y. Shen and H. Foroosh, “View-Invariant Action Recognition from Point Triplets,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 31, no. 10, pp. 1898-1905, Oct. 2009.
[23] V. Parameswaran and R. Chellappa, “View Invariance for Human Action Recognition,” Int'l J. Computer Vision, vol. 66, no. 1, pp. 83-101, 2006.
[24] L. Gorelick, M. Blank, E. Shechtman, M. Irani, and R. Basri, “Actions as Space-Time Shapes,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 29, no. 12, pp. 2247-2253, Dec. 2007.
[25] A.F. Bobick and J.W. Davis, “The Recognition of Human Movement Using Temporal Templates,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23, no. 3, pp. 257-267, Mar. 2001.
[26] B. Peng, G. Qian, and S. Rajko, “View-Invariant Full-Body Gesture Recognition from Video,” Proc. Int'l Conf. Pattern Recognition, pp. 1-5, 2008.
[27] S. Eickeler, A. Kosmala, and G. Rigoll, “Hidden Markov Model Based Continuous Online Gesture Recognition,” Proc. Int'l Conf. Pattern Recognition, pp. 1206-1208, 1998.
[28] J. Alon, V. Athitsos, Q. Yuan, and S. Sclaroff, “A Unified Framework for Gesture Recognition and Spatiotemporal Gesture Segmentation,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 31, no. 9, pp. 1685-1699, Sept. 2009.
[29] T. Starner, J. Weaver, and A. Pentland, “Real-Time American Sign Language Recognition Using Desk and Wearable Computer Based Video,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, no. 12, pp. 1371-1375, Dec. 1998.
[30] H.-K. Lee and J. Kim, “An HMM-Based Threshold Model Approach for Gesture Recognition,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 21, no. 10, pp. 961-973, Oct. 1999.
[31] H.-D. Yang, A.-Y. Park, and S.-W. Lee, “Gesture Spotting and Recognition for Humanrobot Interaction,” IEEE Trans. Robotics, vol. 23, no. 2, pp. 256-270, Apr. 2007.
[32] C. Chu and I. Cohen, “Pose and Gesture Recognition Using 3D Body Shapes Decomposition,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 69-78, 2005.
[33] B. Peng and G. Qian, “Binocular Full-Body Pose Recognition and Orientation Inference Using Multilinear Analysis,” Tensors in Image Processing and Computer Vision, S. Aja-Fernández, R. de Luis García, D. Tao, and X. Li, eds., Springer, 2009.
[34] H. Sakoe, “Dynamic Programming Algorithm Optimization for Spoken Word Recognition,” IEEE Trans. Acoustics, Speech, and Signal Processing, vol. ASSP-26, no. 1, pp. 43-49, Feb. 1978.
[35] L.R. Rabiner, “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,” Proc. IEEE, vol. 77, no. 2, pp. 257-286, Feb. 1989.
[36] A. Mccallum, D. Freitag, and F. Pereira, “Maximum Entropy Markov Models for Information Extraction and Segmentation,” Proc. Int'l Conf. Machine Learning, pp. 591-598, 2000.
[37] J.D. Lafferty, A. McCallum, and F.C.N. Pereira, “Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data,” Proc. Int'l Conf. Machine Learning, pp. 282-289, 2001.
[38] C. Myers, L. Rabiner, and A. Rosenberg, “Performance Tradeoffs in Dynamic Time Warping Algorithms for Isolated Word Recognition,” IEEE Trans. Acoustics, Speech, and Signal Processing, vol. 28, no. 6, pp. 623-635, Dec. 1980.
[39] A. Pikrakis, S. Theodoridis, and D. Kamarotos, “Recognition of Isolated Musical Patterns Using Context Dependent Dynamic Time Warping,” IEEE Trans. Speech and Audio Processing, vol. 11, no. 3, pp. 175-183, May 2003.
[40] J. Lichtenauer, E. Hendriks, and M. Reinders, “Sign Language Recognition by Combining Statistical Dtw and Independent Classification,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 30, no. 11, pp. 2040-2046, Nov. 2008.
[41] T.G. Dietterich, “Machine Learning for Sequential Data: A Review,” Proc. Joint IAPR Int'l Workshop Structural, Syntactic, and Statistical Pattern Recognition, pp. 15-30, 2002.
[42] H.-D. Yang, A.-Y. Park, and S.-W. Lee, “Robust Spotting of Key Gestures from Whole Body Motion Sequence,” Proc. IEEE Int'l Conf. Automatic Face and Gesture Recognition, pp. 231-236, 2006.
[43] S. Rajko, G. Qian, T. Ingalls, and J. James, “Real-Time Gesture Recognition with Minimal Training Requirements and Online Learning,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 1-8, 2007.
[44] K. Nickel and R. Stiefelhagen, “Visual Recognition of Pointing Gestures for Human-Robot Interaction,” Image and Vision Computing, vol. 25, no. 12, pp. 1875-1884, 2007.
[45] B. Peng, G. Qian, and S. Rajko, “View-Invariant Full-Body Gesture Recognition via Multilinear Analysis of Voxel Data,” Proc. Int'l Conf. Distributed Smart Cameras, 2009.
[46] L.D. Lathauwer, B.D. Moor, and J. Vandewalle, “A Multilinear Singular Value Decomposition,” SIAM J. Matrix Analysis and Applications, vol. 21, no. 4, pp. 1253-1278, 2000.
[47] L. Elden, Matrix Methods in Data Mining and Pattern Recognition. SIAM, 2007.
[48] M.A.O. Vasilescu and D. Terzopoulos, “Multilinear Analysis of Image Ensembles: Tensorfaces,” Proc. European Conf. Computer Vision, pp. 447-460, 2002.
[49] D. Vlasic, M. Brand, H. Pfister, and J. Popovi, “Face Transfer with Multilinear Models,” Proc. ACM SIGGRAPH, pp. 426-433, 2005.
[50] M.A.O. Vasilescu and D. Terzopoulos, “Tensortextures: Multilinear Image-Based Rendering,” ACM Trans. Graphics, vol. 23, no. 3, pp. 334-340, 2004.
[51] M.A.O. Vasilescu, “Human Motion Signatures: Analysis, Synthesis, Recognition,” Proc. Int'l Conf. Pattern Recognition, pp. 456-460, 2002.
[52] J. Davis and H. Gao, “An Expressive Three-Mode Principal Components Model of Human Action Style,” Image and Vision Computing, vol. 21, no. 11, pp. 1001-1016, 2003.
[53] C.-S. Lee and A. Elgammal, “Modeling View and Posture Manifolds for Tracking,” Proc. IEEE Int'l Conf. Computer Vision, pp. 1-8, 2007.
[54] S. Rajko and G. Qian, “Hmm Parameter Reduction for Practical Gesture Recognition,” Proc. IEEE Int'l Conf. Face and Gesture Recognition, pp. 1-6, 2008.
[55] J. Shi, S. Belongie, T. Leung, and J. Malik, “Image and Video Segmentation: The Normalized Cut Framework,” Proc. IEEE Int'l Conf. Image Processing, pp. 943-947, 1998.
[56] H.A.L. Kiers, “An Alternating Least Squares Algorithms for Parafac2 and Three-Way Dedicom,” Computational Statistics and Data Analysis, vol. 16, no. 1, pp. 103-118, 1993.
19 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool