The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.01 - January (2011 vol.33)
pp: 172-185
Imran N. Junejo , University of Sharjah, Sharjah, UAE
Emilie Dexter , INRIA Rennes-Bretagne Atlantique, Universitaire de Beaulieu, France
Ivan Laptev , INRIA Paris-Rocquencourt/ENS, Paris
Patrick Pérez , Thompson R&D, Cesson-Sévigné
ABSTRACT
This paper addresses recognition of human actions under view changes. We explore self-similarities of action sequences over time and observe the striking stability of such measures across views. Building upon this key observation, we develop an action descriptor that captures the structure of temporal similarities and dissimilarities within an action sequence. Despite this temporal self-similarity descriptor not being strictly view-invariant, we provide intuition and experimental validation demonstrating its high stability under view changes. Self-similarity descriptors are also shown to be stable under performance variations within a class of actions when individual speed fluctuations are ignored. If required, such fluctuations between two different instances of the same action class can be explicitly recovered with dynamic time warping, as will be demonstrated, to achieve cross-view action synchronization. More central to the current work, temporal ordering of local self-similarity descriptors can simply be ignored within a bag-of-features type of approach. Sufficient action discrimination is still retained in this way to build a view-independent action recognition system. Interestingly, self-similarities computed from different image features possess similar properties and can be used in a complementary fashion. Our method is simple and requires neither structure recovery nor multiview correspondence estimation. Instead, it relies on weak geometric properties and combines them with machine learning for efficient cross-view action recognition. The method is validated on three public data sets. It has similar or superior performance compared to related methods and it performs well even in extreme conditions, such as when recognizing actions from top views while using side views only for training.
INDEX TERMS
Human action recognition, human action synchronization, view invariance, temporal self-similarities, local temporal descriptors.
CITATION
Imran N. Junejo, Emilie Dexter, Ivan Laptev, Patrick Pérez, "View-Independent Action Recognition from Temporal Self-Similarities", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.33, no. 1, pp. 172-185, January 2011, doi:10.1109/TPAMI.2010.68
REFERENCES
[1] T. Moeslund, A. Hilton, and V. Krüger, "A Survey of Advances in Vision-Based Human Motion Capture and Analysis," Computer Vision and Image Understanding, vol. 103, nos. 2-3, pp. 90-126, Nov. 2006.
[2] L. Wang, W. Hu, and T. Tan, "Recent Developments in Human Motion Analysis," Pattern Recognition, vol. 36, no. 3, pp. 585-601, Mar. 2003.
[3] A. Bobick and J. Davis, "The Recognition of Human Movement Using Temporal Templates," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23, no. 3, pp. 257-267, Mar. 2001.
[4] D. Weinland, R. Ronfard, and E. Boyer, "Free Viewpoint Action Recognition Using Motion History Volumes," Computer Vision and Image Understanding, vol. 103, nos. 2-3, pp. 249-257, Nov. 2006.
[5] T. Syeda-Mahmood, M. Vasilescu, and S. Sethi, "Recognizing Action Events from Multiple Viewpoints," Proc. IEEE Workshop Detection and Recognition of Events in Video, pp. 64-72, 2001.
[6] A. Yilmaz and M. Shah, "Actions Sketch: A Novel Action Representation," Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, vol. 1, pp. 984-989, 2005.
[7] L. Gorelick, M. Blank, E. Shechtman, M. Irani, and R. Basri, "Actions as Space-Time Shapes," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 29, no. 12, pp. 2247-2253, Dec. 2007.
[8] M. Grundmann, F. Meier, and I. Essa, "3D Shape Context and Distance Transform for Action Recognition," Proc. Int'l Conf. Pattern Recognition, pp. 1-4, 2008.
[9] I. Laptev, "On Space-Time Interest Points," Int'l J. Computer Vision, vol. 64, nos. 2/3, pp. 107-123, 2005.
[10] E. Shechtman and M. Irani, "Space-Time Behavior Based Correlation," Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, vol. 1, pp. 405-412, 2005.
[11] P. Dollár, V. Rabaud, G. Cottrell, and S. Belongie, "Behavior Recognition via Sparse Spatio-Temporal Features," Proc. Second Joint IEEE Int'l Workshop Visual Surveillance and Performance Evaluation of Tracking and Surveillance, pp. 65-72, 2005.
[12] J. Niebles, H. Wang, and F. Li, "Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words," Proc. British Machine Vision Conf., 2006.
[13] A. Gilbert, J. Illingworth, and R. Bowden, "Scale Invariant Action Recognition Using Compound Features Mined from Dense Spatio-Temporal Corners," Proc. European Conf. Computer Vision, part 1, pp. 222-233, 2008.
[14] C. Schüldt, I. Laptev, and B. Caputo, "Recognizing Human Actions: A Local SVM Approach," Proc. Int'l Conf. Pattern Recognition, vol. 3, pp. 32-36, 2004.
[15] I. Laptev, M. Marszałek, C. Schmid, and B. Rozenfeld, "Learning Realistic Human Actions from Movies," Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, 2008.
[16] S. Ali, A. Basharat, and M. Shah, "Chaotic Invariants for Human Action Recognition," Proc. IEEE Int'l Conf. Computer Vision, 2007.
[17] D. Weinland and E. Boyer, "Action Recognition Using Exemplar-Based Embedding," Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, 2008.
[18] K. Jia and D.-Y. Yeung, "Human Action Recognition Using Local Spatio-Temporal Discriminant Embedding," Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, 2008.
[19] C. Sminchisescu, A. Kanaujia, Z. Li, and D. Metaxas, "Conditional Models for Contextual Human Motion Recognition," Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, 2005.
[20] L.-P. Morency, A. Quattoni, and T. Darrell, "Latent-Dynamic Discriminative Models for Continuous Gesture Recognition," Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, 2007.
[21] S.B. Wang, A. Quattoni, L.-P. Morency, D. Demirdjian, and T. Darrell, "Hidden Conditional Random Fields for Gesture Recognition," Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, 2006.
[22] L. Wang and D. Suter, "Recognizing Human Activities from Silhouettes: Motion Subspace and Factorial Discriminative Graphical Model," Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, 2007.
[23] P. Natarajan and R. Nevatia, "View and Scale Invariant Action Recognition Using Multiview Shape-Flow Models," Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, pp. 1-8, 2008.
[24] A. Yilmaz and M. Shah, "Recognizing Human Actions in Videos Acquired by Uncalibrated Moving Cameras," Proc. IEEE Int'l Conf. Computer Vision, vol. 1, pp. 150-157, 2005.
[25] S. Carlsson, "Recognizing Walking People," Int'l J. Robotics Research, vol. 22, no. 6, pp. 359-370, 2003.
[26] C. Rao, A. Yilmaz, and M. Shah, "View-Invariant Representation and Recognition of Actions," Int'l J. Computer Vision, vol. 50, no. 2, pp. 203-226, Nov. 2002.
[27] Y. Shen and H. Foroosh, "View Invariant Action Recognition Using Fundamental Ratios," Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, 2008.
[28] V. Parameswaran and R. Chellappa, "View Invariance for Human Action Recognition," Int'l J. Computer Vision, vol. 66, no. 1, pp. 83-101, Jan. 2006.
[29] A. Ogale, A. Karapurkar, and Y. Aloimonos, "View-Invariant Modeling and Recognition of Human Actions Using Grammars," Proc. IEEE Workshop Dynamic Vision, pp. 115-126, 2006.
[30] M. Ahmad and S. Lee, "HMM-Based Human Action Recognition Using Multiview Image Sequences," Proc. Int'l Conf. Pattern Recognition, vol. 1, pp. 263-266, 2006.
[31] R. Li, T. Tian, and S. Sclaroff, "Simultaneous Learning of Nonlinear Manifold and Dynamical Models for High-Dimensional Time Series," Proc. IEEE Int'l Conf. Computer Vision, 2007.
[32] F. Lv and R. Nevatia, "Single View Human Action Recognition Using Key Pose Matching and Viterbi Path Searching," Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, 2007.
[33] A. Farhadi and M. Tabrizi, "Learning to Recognize Activities from the Wrong View Point," Proc. European Conf. Computer Vision, part 1, pp. 154-166, 2008.
[34] P. Yan, S.M. Khan, and M. Shah, "Learning 4D Action Feature Models for Arbitrary View Action Recognition," Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, 2008.
[35] D. Weinland, E. Boyer, and R. Ronfard, "Action Recognition from Arbitrary Views Using 3D Exemplars," Proc. IEEE Int'l Conf. Computer Vision, 2007.
[36] E. Shechtman and M. Irani, "Matching Local Self-Similarities Across Images and Videos," Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, 2007.
[37] C. Benabdelkader, R. Cutler, and L. Davis, "Gait Recognition Using Image Self-Similarity," EURASIP J. Applied Signal Processing, vol. 2004, no. 1, pp. 572-585, Jan. 2004.
[38] R. Cutler and L. Davis, "Robust Real-Time Periodic Motion Detection, Analysis, and Applications," IEEE Trans. Pattern Recognition and Machine Intelligence, vol. 22, no. 8, pp. 781-796, Aug. 2000.
[39] S. Carlsson, "Recognizing Walking People," Proc. European Conf. Computer Vision, pp. 472-486, 2000.
[40] J. Eckmann, S. Kamphorst, and D. Ruelle, "Recurrence Plots of Dynamical Systems," Europhysics Letters, vol. 4, pp. 973-977, 1987.
[41] N. Marwan, M.C. Romanoa, M. Thiela, and J. Kurthsa, "Recurrence Plots for the Analysis of Complex Systems," Physics Reports, vol. 438, nos. 5-6, pp. 237-329, 2007.
[42] G. McGuire, N.B. Azar, and M. Shelhamer, "Recurrence Matrices and the Preservation of Dynamical Properties," Physics Letters A, vol. 237, nos. 1-2, pp. 43-47, 1997.
[43] E. Bradley and R. Mantilla, "Recurrence Plots and Unstable Periodic Orbits," Chaos: An Interdisciplinary J. Nonlinear Science, vol. 12, no. 3, pp. 596-600, 2002.
[44] J.S. Iwanski and E. Bradley, "Recurrence Plots of Experimental Data: To Embed or Not to Embed?" Chaos: An Interdisciplinary J. Nonlinear Science, vol. 8, no. 4, pp. 861-871, 1998.
[45] S. Lele, "Euclidean Distance Matrix Analysis (EDMA): Estimation of Mean Form and Mean Form Difference," Math. Geology, vol. 25, no. 5, pp. 573-602, 1993.
[46] C. Tomasi and J. Shi, "Good Features to Track," Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, 1994.
[47] N. Dalal and B. Triggs, "Histograms of Oriented Gradients for Human Detection," Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, vol. 1, pp. 886-893, 2005.
[48] B. Lucas and T. Kanade, "An Iterative Image Registration Technique with an Application to Stereo Vision," Proc. Imaging Understanding Workshop, pp. 121-130, 1981.
[49] J.P.M. de Sa, Applied Statistics Using SPSS, STATISTICA, MATLAB and R. Springer, 2007.
[50] M. Marszałek, C. Schmid, H. Harzallah, and J. van de Weijer, "Learning Object Representations for Visual Object Class Recognition," Proc. PASCAL VOC'07 Challenge Workshop, in conjunction with IEEE Int'l Conf. Computer Vision, 2007.
[51] I. Laptev, B. Caputo, C. Schüldt, and T. Lindeberg, "Local Velocity-Adapted Motion Events for Spatio-Temporal Recognition," Computer Vision and Image Understanding, vol. 108, no. 3, pp. 207-229, 2007.
[52] N. Ikizler and P. Duygulu, "Human Action Recognition Using Distribution of Oriented Rectangular Patches," Proc. Workshop Human Motion, pp. 271-284, 2007.
[53] H. Wang, M. Ullah, A. Kläser, I. Laptev, and C. Schmid, "Evaluation of Local Spatio-Temporal Features for Action Recognition," Proc. British Machine Vision Conf., 2009.
31 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool