This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Human Action Recognition in Videos Using Kinematic Features and Multiple Instance Learning
February 2010 (vol. 32 no. 2)
pp. 288-303
Saad Ali, Carnegie Mellon University, Pittsburgh
Mubarak Shah, University of Central Florida, Orlando
We propose a set of kinematic features that are derived from the optical flow for human action recognition in videos. The set of kinematic features includes divergence, vorticity, symmetric and antisymmetric flow fields, second and third principal invariants of flow gradient and rate of strain tensor, and third principal invariant of rate of rotation tensor. Each kinematic feature, when computed from the optical flow of a sequence of images, gives rise to a spatiotemporal pattern. It is then assumed that the representative dynamics of the optical flow are captured by these spatiotemporal patterns in the form of dominant kinematic trends or kinematic modes. These kinematic modes are computed by performing Principal Component Analysis (PCA) on the spatiotemporal volumes of the kinematic features. For classification, we propose the use of multiple instance learning (MIL) in which each action video is represented by a bag of kinematic modes. Each video is then embedded into a kinematic-mode-based feature space and the coordinates of the video in that space are used for classification using the nearest neighbor algorithm. The qualitative and quantitative results are reported on the benchmark data sets.

[1] L. Sirovich, “Turbulence and the Dynamics of Coherent Structures: Part I-III,” Quarterly Applied Math., vol. 45, pp. 561-590, 1987.
[2] Y. Chen, J. Bi, and J.Z. Wang, “MILES: Multiple Instance Learning via Embedded Instance Selection,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 28, no. 12, pp. 1931-1947, Dec. 2006.
[3] A. Yilmaz and M. Shah, “Actions Sketch: A Novel Action Representation,” Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, 2005.
[4] M. Blank, L. Gorelick, E. Shechtman, M. Irani, and R. Basri, “Actions as Space-Time Shapes,” Proc. IEEE Int'l Conf. Computer Vision, 2005.
[5] Y. Ke, R. Sukthankar, and M. Hebert, “Efficient Visual Event Detection Using Volumetric Features,” Proc. IEEE Int'l Conf. Computer Vision, 2005.
[6] E. Shechtman and M. Irani, “Space-Time Behavior Based Correlation,” Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, 2005.
[7] A.F. Bobick and J. Davis, “An Appearance-Based Representation of Action,” Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, 1996.
[8] A.F. Bobick and J. Davis, “The Recognition of Human Movement Using Temporal Templates,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23, no. 3, pp. 257-267, Mar. 2001.
[9] J.C. Niebles, H. Wang, and L. Fei-Fei, “Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words,” Proc. British Machine Vision Conf., 2006.
[10] P. Dollar, V. Rabaud, G. Cottrell, and S. Belongie, “Behavior Recognition via Sparse Spatio-Temporal Features,” Proc. IEEE Int'l Workshop VS-PETS, 2005.
[11] T. Darrell and A. Pentland, “Classifying Hand Gestures with a View-Based Distributed Representation,” Proc. Advances in Neural Information Processing Systems, 1993.
[12] J. Yamato, J. Ohya, and K. Ishii, “Recognizing Human Action in Time Sequential Images Using Hidden Markov Model,” Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, 1992.
[13] C. Cedras and M. Shah, “Motion Based Recognition: A Survey,” Image and Vision Computing, vol. 13, no. 2, pp. 129-155, 1995.
[14] J.K. Aggarwal, Q. Cai, W. Liao, and B. Sabata, “Articulated and Elastic Non-Rigid Motion: A Review,” Proc. Workshop Motion of Non-Rigid and Articulated Objects, 1994.
[15] J.K. Aggarwal and Q. Cai, “Human Motion Analysis: A Review,” Computer Vision and Image Understanding, vol. 73, no. 3, pp. 428-440, 1999.
[16] S.X. Ju, M.J. Black, and Y. Yacoob, “Cardboard People: A Parameterized Model of Articulated Image Motion,” Proc. Second Int'l Conf. Automatic Face and Gesture Recognition, 1996.
[17] Y. Yacoob and M. Black, “Parameterized Modeling and Recognition of Activities,” Computer Vision and Image Understanding, vol. 73, no. 2, pp. 232-247, 1999.
[18] M.J. Black and Y. Yacoob, “Recognizing Facial Expressions in Image Sequences Using Local Parameterized Models of Image Motion,” Proc. IEEE Int'l Conf. Computer Vision, 1997.
[19] C. Carlsson and J. Sullivan, “Action Recognition by Shape Matching to Key Frames,” Proc. Workshop Models versus Exemplars in Computer Vision, 2001.
[20] H. Jiang, M.S. Drew, and Z.N. Li, “Successive Convex Matching for Action Detection,” Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, 2006.
[21] T. Starner and A. Pentland, “Visual Recognition of American Sign Language Using Hidden Markov Model,” Proc. Int'l Workshop Automatic Face and Gesture Recognition, 1995.
[22] K.M. Cheung, S. Baker, and T. Kanade, “Shape-from-Silhouette of Articulated Objects and Its Use for Human Body Kinematics Estimation and Motion Capture,” Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, 2003.
[23] T.S. Mahmood, A. Vasilescu, and S. Sethi, “Recognition Action Events from Multiple View Points,” Proc. IEEE Workshop Detection and Recognition of Events in Video, 2001.
[24] J. Liu, S. Ali, and M. Shah, “Recognizing Human Actions Using Multiple Features,” Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, 2008.
[25] P. Scovanner, S. Ali, and M. Shah, “A 3-Dimensional SIFT Descriptor and Its Application to Action Recognition,” Proc. ACM Int'l Conf. Multimedia, 2007.
[26] D.G. Lowe, “Distinctive Image Features from Scale Invariant Keypoints,” Int'l J. Computer Vision, vol. 60, no. 2, pp. 91-110, 2004.
[27] D. Weinland, R. Ronfard, and E. Boyer, “Free View-Point Action Recognition Using Motion History Volumes,” Computer Vision and Image Understanding, vol. 104, no. 2, pp. 249-257, 2006.
[28] J. Little and J.E. Boyd, “Recognizing People by Their Gait: The Shape of Motion,” J. Computer Vision Research, vol. 1, pp. 2-32, 1998.
[29] J. Hoey and J. Little, “Representation and Recognition of Complex Human Motion,” Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, 2000.
[30] T. Arbel, F. Ferrie, and M. Mitran, “Recognizing Objects from Curvilinear Motion,” Proc. British Machine Vision Conf., 2000.
[31] J. Little and J. Boyd, “Describing Motion For Recognition,” Proc. Int'l Symp. Computer Vision, 1995.
[32] A. Efros, A. Berg, G. Mori, and J. Malik, “Recognizing Action at a Distance,” Proc. IEEE Int'l Conf. Computer Vision, 2003.
[33] I. Laptev and T. Lindeberg, “Space Time Interest Points,” Proc. IEEE Int'l Conf. Computer Vision, 2003.
[34] I. Laptev, M. Marszalek, C. Schmid, and B. Rozenfeld, “Learning Realistic Human Actions from Movies,” Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, 2008.
[35] A. Oikonomopoulousm, I. Patras, and M. Pantic, “Spatiotemporal Saliency for Human Action Recognition,” Proc. IEEE Int'l Conf. Multimedia and Expo, 2005.
[36] A. Oikonomopoulousm, I. Patras, and M. Pantic, “Kernel Based Recognition of Human Actions Using Spatiotemporal Salient Points,” Proc. IEEE Int'l Conf. Multimedia and Expo, 2005.
[37] C. Schuldt, I. Laptev, and B. Caputo, “Recognizing Human Actions: A Local SVM Approach,” Proc. IEEE Int'l Conf. Pattern Recognition, 2004.
[38] P.A. Durbin and B.A.P. Reif, Statistical Theory and Modeling for Turbulent Flows. John Wiley & Sons, 2003.
[39] http:/urapiv.wordpress.com, 2009.
[40] O. Maron, “Learning from Ambiguity,” Dept. of Electrical and Computer Science, Massachusetts Inst. of Tech nology, 1998.
[41] A. Yilmaz, X. Li, and M. Shah, “Contour-Based Object Tracking with Occlusion Handling in Video Acquired Using Mobile Cameras,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 26, no. 11, pp. 1531-1536, Nov. 2004.
[42] T.K. Kim, S.F. Wong, and R. Cipolla, “Tensor Canonical Correlation Analysis for Action Classification,” Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, 2007.
[43] S.F. Wong and R. Cipolla, “Extracting Spatiotemporal Interest Points Using Global Information PDF,” Proc. IEEE Int'l Conf. Computer Vision, 2007.
[44] http://www.nada.kth.se/cvap/actions00sequences.txt , 2009.

Index Terms:
Action recognition, motion, video analysis, principal component analysis, kinematic features.
Citation:
Saad Ali, Mubarak Shah, "Human Action Recognition in Videos Using Kinematic Features and Multiple Instance Learning," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 32, no. 2, pp. 288-303, Feb. 2010, doi:10.1109/TPAMI.2008.284
Usage of this product signifies your acceptance of the Terms of Use.