The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.05 - May (2009 vol.31)
pp: 795-810
Sunita Nayak , Photometria Inc., San Diego
Sudeep Sarkar , University of South Florida, Tampa
Barbara Loeding , University of South Florida, Lakeland
ABSTRACT
Some articulated motion representations rely on frame-wise abstractions of the statistical distribution of low-level features such as orientation, color, or relational distributions. As configuration among parts changes with articulated motion, the distribution changes, tracing a trajectory in the latent space of distributions, which we call the configuration space. These trajectories can then be used for recognition using standard techniques such as dynamic time warping. The core theory in this paper concerns embedding the frame-wise distributions, which can be looked upon as probability functions, into a low-dimensional space so that we can estimate various meaningful probabilistic distances such as the Chernoff, Bhattacharya, Matusita, Kullback-Leibler (KL) or symmetric-KL distances based on dot products between points in this space. Apart from computational advantages, this representation also affords speed-normalized matching of motion signatures. Speed normalized representations can be formed by interpolating the configuration trajectories along their arc lengths, without using any knowledge of the temporal scale variations between the sequences. We experiment with five different probabilistic distance measures and show the usefulness of the representation in three different contexts—sign recognition (with large number of possible classes), gesture recognition (with person variations), and classification of human-human interaction sequences (with segmentation problems). We find the importance of using the right distance measure for each situation. The low-dimensional embedding makes matching two to three times faster, while achieving recognition accuracies that are close to those obtained without using a low-dimensional embedding. We also empirically establish the robustness of the representation with respect to low-level parameters, embedding parameters, and temporal-scale parameters.
INDEX TERMS
Human motion classification, embedding probability density functions, gesture recognition, sign language recognition.
CITATION
Sunita Nayak, Sudeep Sarkar, Barbara Loeding, "Distribution-Based Dimensionality Reduction Applied to Articulated Motion Recognition", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.31, no. 5, pp. 795-810, May 2009, doi:10.1109/TPAMI.2008.80
REFERENCES
[1] T. Moeslund, A. Hilton, and V. Krüger, “A Survey of Advances in Vision-Based Human Motion Capture and Analysis,” Computer Vision and Image Understanding, vol. 103, nos. 2-3, pp. 90-126, 2006.
[2] D. Gavrila, “The Visual Analysis of Human Movement: A Survey,” Computer Vision and Image Understanding, vol. 73, pp.82-98, 1999.
[3] J.K. Aggarwal and Q. Cai, “Human Motion Analysis: A Review,” Computer Vision and Image Understanding, vol. 73, no. 3, pp. 428-440, 1999.
[4] T.B. Moeslund and E. Granum, “A Survey of Computer Vision-Based Human Motion Capture,” Computer Vision and Image Understanding, vol. 81, no. 3, pp. 231-268, 2001.
[5] C. Wang, W. Gao, and S. Shan, “An Approach Based on Phonemes to Large Vocabulary Chinese Sign Language Recognition,” Proc. Int'l Conf. Automatic Face and Gesture Recognition, pp. 393-398, 2002.
[6] J. Hernandez-Rebollar, N. Kyriakopoulos, and R. Lindeman, “A New Instrumented Approach for Translating American Sign Language into Sound and Text,” Proc. Int'l Conf. Automatic Face and Gesture Recognition, pp. 547-552, 2004.
[7] G. Johansson, “Visual Perception of Biological Motion and a Model for Its Analysis,” Perception and Psychophysics, vol. 73, no. 2, pp. 201-211, 1973.
[8] Z. Zhang and N.F. Troje, “3D Periodic Human Motion Reconstruction from 2D Motion Sequences,” Neural Computation, vol. 19, pp. 1400-1421, 2007.
[9] L. Zelnik-Manor and M. Irani, “Statistical Analysis of Dynamic Actions,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 28, no. 9, pp. 1530-1535, Sept. 2006.
[10] A. Efros, A. Berg, G. Mori, and J. Malik, “Recognizing Action at a Distance,” Proc. Int'l Conf. Computer Vision, vol. 2, pp. 726-733, 2003.
[11] S. Wang, A. Quattoni, L.P. Morency, D. Demirdjian, and T. Darrell, “Hidden Conditional Random Fields for Gesture Recognition,” Computer Vision and Pattern Recognition, vol. 2, pp. 1521-1527, 2006.
[12] S. Wong and R. Cipolla, “Continuous Gesture Recognition Using a Sparse Bayesian Classifier,” Proc. Int'l Conf. Pattern Recognition, pp. 1084-1087, 2006.
[13] Y. Yacoob and M.J. Black, “Parameterized Modeling and Recognition of Activities,” Computer Vision and Image Understanding, pp. 232-247, 1999.
[14] Y. Sheikh, M. Sheikh, and M. Shah, “Exploring the Space of a Human Action,” Proc. Int'l Conf. Computer Vision, pp. 144-149, 2005.
[15] N. Vaswani, A. Roy-Chowdhury, and R. Chellappa, “Shape Activity: A Continuous-State HMM for Moving/Deforming Shapes with Application to Abnormal Activity Detection,” IEEE Trans. Image Processing, vol. 14, pp. 1603-1616, Oct. 2005.
[16] C. Sminchisescu, A. Kanaujia, L. Zhiguo, and D. Metaxas, “Conditional Models for Contextual Human Motion Recognition,” Proc. Int'l Conf. Computer Vision, vol. 2, pp. 1808-1815, 2005.
[17] R. Urtasun, D. Fleet, and P. Fua, “Temporal Motion Models for Monocular and Multiview 3D Human Body Tracking,” Computer Vision and Image Understanding, vol. 103, pp. 157-177, 2006.
[18] M. Blank, L. Gorelick, E. Shechtman, M. Irani, and R. Basri, “Actions as Space-Time Shapes,” Proc. Int'l Conf. Computer Vision, vol. 2, pp. 1395-1402, 2005.
[19] A.O. Balan and M.J. Black, “An Adaptive Appearance Model Approach for Model-Based Articulated Object Tracking,” Computer Vision and Pattern Recognition, vol. 1, pp. 758-765, 2006.
[20] C. Shan, Y. Wei, T. Tan, and F. Ojardias, “Real Time Hand Tracking by Combining Particle Filtering and Mean Shift,” Proc. Int'l Conf. Automatic Face and Gesture Recognition, pp. 669-674, 2004.
[21] I. Vega and S. Sarkar, “Statistical Motion Model Based on the Change of Feature Relationships: Human Gait-Based Recognition,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 25, no. 10, pp. 1323-1328, Oct. 2003.
[22] A. Veeraraghavan, A.K. Roy-Chowdhury, and R. Chellappa, “Matching Shape Sequences in Video with Applications in Human Movement Analysis,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 27, no. 12, pp. 1896-1909, Dec. 2005.
[23] Y. Ke, R. Sukthankar, and M. Hebert, “Event Detection in Cluttered Videos,” Proc. Int'l Conf. Computer Vision, 2007.
[24] H. Cooper and R. Bowden, “Large Lexicon Detection of Sign Language,” Lecture Notes in Computer Science, no. 4796, pp. 88-97, 2007.
[25] A. Bobick and J. Davis, “The Recognition of Human Movement Using Temporal Templates,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23, no. 3, pp. 257-267, Mar. 2001.
[26] O. Masoud and N. Papanikolopoulos, “A Method for Human Action Recognition,” Image and Vision Computing, vol. 21, no. 8, pp.729-743, 2003.
[27] H. Li and M. Greenspan, “Multi-Scale Gesture Recognition from Time-Varying Contours,” Proc. Int'l Conf. Computer Vision, vol. 1, pp. 236-243, 2005.
[28] M. Zahedi, D. Keysers, T. Deselaers, and H. Ney, “Combination of Tangent Distance and Image Distortion for Appearance-Based Sign Language Recognition,” Pattern Recognition, vol. 3663, pp.401-408, 2005.
[29] Y. Ukrainitz and M. Irani, “Aligning Sequences and Actions by Maximizing Space-Time Correlations,” Proc. European Conf. Computer Vision, vol. 3, pp. 538-550, 2006.
[30] W. Freeman and M. Roth, “Orientation Histograms for Hand and Gesture Recognition,” Proc. Int'l Workshop Face and Gesture Recognition, pp. 296-301, 1995.
[31] H. Poizner, U. Bellugi, and V. Lutes-Driscoll, “Perception of American Sign Language in Dynamic Point-Light Displays,” J.Experimental Psychology: Human Perception and Performance, vol. 7, no. 2, pp. 430-440, 1981.
[32] V. Tartter and S. Fischer, “Perceiving Minimal Distinctions in ASL under Normal and Point-Light Display Conditions,” Perception and Psychophysics, vol. 32, no. 4, pp. 327-334, 1982.
[33] D. Crandall and J. Luo, “Robust Color Object Detection Using Spatial-Color Joint Probability Functions,” Computer Vision and Pattern Recognition, vol. 1, pp. 379-385, 2004.
[34] K. Mikolajczyk, C. Schmid, and A. Zisserman, “Human Detection Based on a Probabilistic Assembly of Robust Part Detectors,” Proc. European Conf. Computer Vision, vol. 1, pp. 69-82, 2004.
[35] S. Nayak, S. Sarkar, and B. Loeding, “Unsupervised Modeling of Signs Embedded in Continuous Sentences,” Proc. CVPR Workshop Vision for Human-Computer Interaction, 2005.
[36] M. Yang, N. Ahuja, and M. Tabb, “Extraction of 2D Motion Trajectories and Its Application to Hand Gesture Recognition,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 8, pp.1061-1074, Aug. 2002.
[37] A.D. Wilson and A.F. Bobick, “Parametric Hidden Markov Models for Gesture Recognition,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 21, no. 9, pp. 884-900, Sept. 1999.
[38] H. Lee and J. Kim, “An HMM-Based Threshold Model Approach for Gesture Recognition,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 21, no. 10, pp. 961-973, Oct. 1999.
[39] A. Bobick and A. Wilson, “A State Based Approach to the Representation and Recognition of Gesture,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 19, no. 12, pp. 1325-1337, Dec. 1997.
[40] C. Rao, A. Yilmaz, and M. Shah, “View-Invariant Representation and Recognition of Actions,” Int'l J. Computer Vision, vol. 50, pp.203-226, 2002.
[41] L.R. Rabiner, “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,” Proc. IEEE, vol. 2, pp. 257-286, 1989.
[42] J. Lafferty, A. McCallum, and F. Pereira, “Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data,” Proc. Int'l Conf. Machine Learning, vol. 18, pp. 282-289, 2001.
[43] A. McCallum, D. Freitag, and F. Pereira, “Maximum Entropy Markov Models for Information Extraction and Segmentation,” Proc. Int'l Conf. Machine Learning, pp. 591-598, 2000.
[44] A. Bhattacharyya, “On a Measure of Divergence between Two Statistical Populations Defined by Their Probability Distributions,” Bull. Calcutta Math. Soc., vol. 35, pp. 99-109, 1943.
[45] K. Matusita, “Decision Rules Based on the Distance for Problems of Fit, Two Samples and Estimation,” Annals of Math. Statistics, vol. 26, pp. 631-640, 1955.
[46] T.M. Cover and J.A. Thomas, Elements of Information Theory, 1991.
[47] H. Chernoff, “A Measure of Asymptotic Efficiency of Tests for a Hypothesis Based on a Sum of Observations,” Annals of Math. Statistics, vol. 23, pp. 493-507, 1952.
[48] B. Schölkopf, A.J. Smola, and K.-R. Müller, “Nonlinear Component Analysis as a Kernel Eigenvalue Problem,” Neural Computation, vol. 10, pp. 1299-1319, 1998.
[49] T. Cox and M. Cox, Multidimensional Scaling. Chapman and Hall/CRC, 2001.
[50] J.B. Tenenbaum, V. de Silva, and J.C. Langford, “A Global Geometric Framework for Nonlinear Dimensionality Reduction,” Science, vol. 290, pp. 2319-2323, 2000.
[51] S. Roweis and L. Saul, “Nonlinear Dimensionality Reduction by Locally Linear Embedding,” Science, vol. 290, pp. 2223-2326, 2000.
[52] M. Belkin and P. Niyogi, “Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering,” Advances in Neural Information Processing Systems, vol. 14, pp. 585-591, 2002.
[53] R.A. Horn, Matrix Analysis, chapter 7. Cambridge Univ. Press, 1985.
[54] R.J. Bezdek and J.C. Hathaway, “Convergence of Alternating Optimization,” Neural Parallel and Scientific Computations, pp. 351-368, 2003.
[55] S. Phung, A. Bouzerdoum, and D. Chai, “Skin Segmentation Using Color Pixel Classification: Analysis and Comparison,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 27, no. 1, pp. 148-154, Jan. 2005.
[56] A. Just and S. Marcel, “Two-Handed Gesture Recognition,” IDIAP Research Report 24, IDIAP Research Inst., 2005.
[57] Human Interaction, CMU Graphics Lab Motion Capture Database, Carnegie Mellon Univ.
[58] S. Nayak, S. Sarkar, and K. Sengupta, “Modeling Signs Using Functional Data Analysis,” Proc. Indian Conf. Computer Vision, Graphics and Image Processing, 2004.
16 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool