The Community for Technology Leaders
Subscribe
Issue No.03 - March (2013 vol.35)
pp: 582-596
Feng Zhou , Robot. Inst., Carnegie Mellon Univ., Pittsburgh, PA, USA
F. De la Torre , Robot. Inst., Carnegie Mellon Univ., Pittsburgh, PA, USA
J. K. Hodgins , Robot. Inst., Carnegie Mellon Univ., Pittsburgh, PA, USA
ABSTRACT
Temporal segmentation of human motion into plausible motion primitives is central to understanding and building computational models of human motion. Several issues contribute to the challenge of discovering motion primitives: the exponential nature of all possible movement combinations, the variability in the temporal scale of human actions, and the complexity of representing articulated motion. We pose the problem of learning motion primitives as one of temporal clustering, and derive an unsupervised hierarchical bottom-up framework called hierarchical aligned cluster analysis (HACA). HACA finds a partition of a given multidimensional time series into m disjoint segments such that each segment belongs to one of k clusters. HACA combines kernel k-means with the generalized dynamic time alignment kernel to cluster time series data. Moreover, it provides a natural framework to find a low-dimensional embedding for time series. HACA is efficiently optimized with a coordinate descent strategy and dynamic programming. Experimental results on motion capture and video data demonstrate the effectiveness of HACA for segmenting complex motions and as a visualization tool. We also compare the performance of HACA to state-of-the-art algorithms for temporal clustering on data of a honey bee dance. The HACA code is available online.
INDEX TERMS
Kernel, Time series analysis, Humans, Motion segmentation, Clustering algorithms, Heuristic algorithms, Legged locomotion, dynamic programming, Temporal segmentation, time series clustering, time series visualization, human motion analysis, kernel k-means, spectral clustering
CITATION
Feng Zhou, F. De la Torre, J. K. Hodgins, "Hierarchical Aligned Cluster Analysis for Temporal Clustering of Human Motion", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.35, no. 3, pp. 582-596, March 2013, doi:10.1109/TPAMI.2012.137
REFERENCES
 [1] D. Gavrila, "The Visual Analysis of Human Movement: A Survey," Computer Vision and Image Understanding, vol. 73, no. 1, pp. 82-98, 1999. [2] T.B. Moeslund, H. Adrian, and V. Krüger, "A Survey of Advances in Vision-Based Human Motion Capture and Analysis," Computer Vision and Image Understanding, vol. 104, nos. 2/3, pp. 90-126, 2006. [3] R. Poppe, "Vision-Based Human Motion Analysis: An Overview," Computer Vision and Image Understanding, vol. 108, nos. 1/2, pp. 4-18, 2007. [4] Y. Rui and P. Anandan, "Segmenting Visual Actions Based on Spatio-Temporal Motion Patterns," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2000. [5] L. Zelnik-Manor and M. Irani, "Statistical Analysis of Dynamic Actions," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 28, no. 9, pp. 1530-1535, Sept. 2006. [6] D.D. Vecchio, R.M. Murray, and P. Perona, "Decomposition of Human Motion Into Dynamics-Based Primitives with Application to Drawing Tasks," Automatica, vol. 39, no. 12, pp. 2085-2098, 2003. [7] C. Lu and N.J. Ferrier, "Repetitive Motion Analysis: Segmentation and Event Classification," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 26, no. 2, pp. 258-263, Feb. 2004. [8] G. Guerra-Filho and Y. Aloimonos, "Understanding Visuo-Motor Primitives for Motion Synthesis and Analysis," J. Visualization and Computer Animation, vol. 17, pp. 207-217, 2006. [9] F. De la Torre, J. Campoy, Z. Ambadar, and J.F. Cohn, "Temporal Segmentation of Facial Behavior," Proc. 11th IEEE Int'l Conf. Computer Vision, 2007. [10] P.K. Turaga, A. Veeraraghavan, and R. Chellappa, "Unsupervised View and Rate Invariant Clustering of Video Sequences," Computer Vision and Image Understanding, vol. 113, no. 3, pp. 353-371, 2009. [11] T. Kobayashi, F. Yoshikawa, and N. Otsu, "Motion Image Segmentation Using Global Criteria and DP," Proc. IEEE Eighth Int'l Conf. Automatic Face & Gesture Recognition, 2008. [12] S.M. Oh, J.M. Rehg, T. Balch, and F. Dellaert, "Learning and Inferring Motion Patterns Using Parametric Segmental Switching Linear Dynamic Systems," Int'l J. Computer Vision, vol. 77, nos. 1-3, pp. 103-124, 2008. [13] A. Fod, M.J. Matarić, and O.C. Jenkins, "Automated Derivation of Primitives for Movement Classification," Autonomous Robots, vol. 12, no. 1, pp. 39-54, 2002. [14] O.C. Jenkins and M.J. Matarić, "Deriving Action and Behavior Primitives from Human Motion Data," Proc. IEEE/RSJ Int'l Conf. Intelligent Robots and Systems, 2002. [15] J. Barbic, A. Safonova, J.-Y. Pan, C. Faloutsos, J.K. Hodgins, and N.S. Pollard, "Segmenting Motion Capture Data into Distinct Behaviors," Proc. Graphics Interface, 2004. [16] M. Müller, T. Röder, and M. Clausen, "Efficient Content-Based Retrieval of Motion Capture Data," ACM Trans. Graphics, vol. 24, no. 3, pp. 677-685, 2005. [17] G. Liu and L. McMillan, "Segment-Based Human Motion Compression," Proc. ACM Siggraph/Eurographics Symp. Computer Animation, 2006. [18] F. Lv and R. Nevatia, "Recognition and Segmentation of 3-D Human Action Using HMM and Multi-Class AdaBoost," Proc. European Conf. Computer Vision, 2006. [19] R. Hamid, S. Maddi, A. Bobick, and I. Essa, "Structure from Statistics-Unsupervised Activity Analysis Using Suffix Trees," Proc. 11th IEEE Int'l Conf. Computer Vision, 2007. [20] P. Beaudoin, S. Coros, M. van de Panne, and P. Poulin, "Motion-Motif Graphs," Proc. ACM Siggraph/Eurographics Symp. Computer Animation, 2008. [21] O.C. Jenkins and M.J. Matarić, "A Spatio-Temporal Extension to Isomap Nonlinear Dimension Reduction," Proc. Int'l Conf. Machine Learning, 2004. [22] J.B. Tenenbaum, V. de Silva, and J.C. Langford, "A Global Geometric Framework for Nonlinear Dimensionality Reduction," Science, vol. 290, no. 5500, pp. 2319-2323, 2000. [23] H. Zhong, J. Shi, and M. Visontai, "Detecting Unusual Activity in Video," Proc. IEEE Conf. Computer Vision Pattern Recognition, 2004. [24] F. De la Torre and C. Agell, "Multimodal Diaries," Proc. IEEE Int'l Conf. Multimedia and Expo, 2007. [25] G. Guerra-Filho and Y. Aloimonos, "A Language for Human Action," Computer, vol. 40, no. 5, pp. 42-51, May 2007. [26] D. Minnen, C.L. Isbell, I.A. Essa, and T. Starner, "Discovering Multivariate Motifs Using Subsequence Density Estimation and Greedy Mixture Learning," Proc. 22nd Int'l Conf. Artificial Intelligence, 2007. [27] E.J. Keogh, S. Chu, D. Hart, and M.J. Pazzani, "An Online Algorithm for Segmenting Time Series," Proc. IEEE Int'l Conf. Data Mining, 2001. [28] X. Xuan and K. Murphy, "Modeling Changing Dependency Structure in Multivariate Time Series," Proc. 24th Int'l Conf. Machine Learning, 2007. [29] M. Ostendorf, V.V. Digalakis, and O.A. Kimball, "From HMM's to Segment Models: A Unified View of Stochastic Modeling for Speech Recognition," IEEE Trans. Speech and Audio Processing, vol. 4, no. 5, pp. 360-378, Sept. 1996. [30] F. Desobry, M. Davy, and C. Doncarli, "An Online Kernel Change Detection Algorithm," IEEE Trans. Signal Processing, vol. 53, no. 8, pp. 2961-2974, Aug. 2005. [31] E. Fox, E. Sudderth, M. Jordan, and A. Willsky, "Nonparametric Bayesian Learning of Switching Linear Dynamical Systems," Proc. Neural Information Processing Systems, 2008. [32] S.M. Kay, Fundamentals of Statistical Signal Processing, Volume 2: Detection Theory. Prentice-Hall, Inc., 1993. [33] Z. Harchaoui, F. Bach, and E. Moulines, "Kernel Change-Point Analysis," Proc. Neural Information Processing Systems, 2009. [34] P. Fearnhead, "Exact and Efficient Bayesian Inference for Multiple Changepoint Problems," Statistics Computing, vol. 16, no. 2, pp. 203-213, 2006. [35] V. Pavlović, J.M. Rehg, and J. MacCormick, "Learning Switching Linear Models of Human Motion," Proc. Neural Information Processing Systems, 2000. [36] E. Fox, E. Sudderth, M. Jordan, and A. Willsky, "Sharing Features among Dynamical Systems with Beta Processes," Proc. Neural Information Processing Systems, 2009. [37] J.B. MacQueen, "Some Methods for Classification and Analysis of Multivariate Observations," Proc. Fifth Berkeley Symp. Math. Statistical Probability, 1967. [38] F. De la Torre and T. Kanade, "Discriminative Cluster Analysis," Proc. 23rd Int'l Conf. Machine Learning, 2006. [39] H. Zha, X. He, C.H.Q. Ding, M. Gu, and H.D. Simon, "Spectral Relaxation for $k$ -Means Clustering," Proc. Neural Information Processing Systems, 2001. [40] S.Z. Selim and M.A. Ismail, "$k$ -Means-Type Algorithms: A Generalized Convergence Theorem and Characterization of Local Optimality," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 6, no. 1, pp. 81-87, Jan. 1984. [41] I.S. Dhillon, Y. Guan, and B. Kulis, "Kernel $k$ -Means: Spectral Clustering and Normalized Cuts," Proc. 10th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, 2004. [42] R. Cutler and L.S. Davis, "Robust Real-Time Periodic Motion Detection, Analysis, and Applications," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp. 781-796, Aug. 2000. [43] N. Marwan, M.C. Romanoa, M. Thiela, and J. Kurthsa, "Recurrence Plots for the Analysis of Complex Systems," Physics Reports, vol. 438, pp. 237-329, 2007. [44] B.-K. Yi, H.V. Jagadish, and C. Faloutsos, "Efficient Retrieval of Similar Time Sequences under Time Warping," Proc. 14th Int'l Conf. Data Eng., 1998. [45] H. Shimodaira, K.-I. Noma, M. Nakai, and S. Sagayama, "Dynamic Time-Alignment Kernel in Support Vector Machine," Proc. Neural Information Processing Systems, 2001. [46] B. Schölkopf and A.J. Smola, Learning with Kernels. MIT Press, 2002. [47] B. Haasdonk, "Feature Space Interpretation of SVMs with Indefinite Kernels," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 27, no. 4, pp. 482-492, Apr. 2005. [48] M. Cuturi, J.-P. Vert, O. Birkenes, and T. Matsui, "A Kernel for Time Series Based on Global Alignments," Proc. IEEE Int'l Conf. Acoustics, Speech, and Signal Processing, 2007. [49] T.W. Liao, "Clustering of Time Series Data—A Survey," Pattern Recognition, vol. 38, no. 11, pp. 1857-1874, 2005. [50] F. De la Torre, "A Least-Squares Framework for Component Analysis," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 34, no. 6, pp. 1041-1055, June 2012. [51] R. Zass and A. Shashua, "A Unifying Approach to Hard and Probabilistic Clustering," Proc. 10th IEEE Int'l Conf. Computer Vision, 2005. [52] S. Roweis and Z. Ghahramani, "A Unifying Review of Linear Gaussian Models," Neural Computation, vol. 11, no. 2, pp. 305-345, 1999. [53] D.P. Bertsekas, Dynamic Programming and Optimal Control. Athena Scientific, 1995. [54] E.J. Keogh and M.J. Pazzani, "Scaling Up Dynamic Time Warping for Datamining Applications," Proc. Sixth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, 2000. [55] J. Shi and J. Malik, "Normalized Cuts and Image Segmentation," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp. 888-905, Aug. 2000. [56] A.Y. Ng, M.I. Jordan, and Y. Weiss, "On Spectral Clustering: Analysis and an Algorithm," Proc. Neural Information Processing Systems, pp. 849-856, 2001. [57] R. Burkard, M. DellAmico, and S. Martello, Assignment Problems. SIAM, 2009. [58] "Carnegie Mellon University Motion Capture Database," http:/mocap.cs.cmu.edu, 2012. [59] J. Lee, J. Chai, P.S.A. Reitsma, J.K. Hodgins, and N.S. Pollard, "Interactive Control of Avatars Animated with Human Motion Data," ACM Trans. Graphics, vol. 21, no. 3, pp. 491-500, 2002. [60] J. Wang and B. Bodenheimer, "An Evaluation of a Cost Metric for Selecting Transitions between Motion Segments," Proc. ACM Siggraph/Eurographics Symp. Computer Animation, 2003. [61] R. Bowden, "Learning Statistical Models of Human Motion," Proc. IEEE Workshop Human Modeling, Analysis, and Synthesis, 2000. [62] K. Forbes and E. Fiume, "An Efficient Search Algorithm for Motion Data Using Weighted PCA," Proc. ACM Siggraph/Eurographics Symp. Computer Animation, 2005. [63] F. Zhou, F. De la Torre, and J.K. Hodgins, "Aligned Cluster Analysis for Temporal Segmentation of Human Motion," Proc. Eighth IEEE Int'l Conf. Automatic Face & Gesture Recognition, 2008. [64] L. Gorelick, M. Blank, E. Shechtman, M. Irani, and R. Basri, "Actions as Space-Time Shapes," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 29, no. 12, pp. 2247-2253, Dec. 2007. [65] C. Schüldt, I. Laptev, and B. Caputo, "Recognizing Human Actions: A Local SVM Approach," Proc. 17th Int'l Conf. Pattern Recognition, 2004. [66] A.A. Efros, A.C. Berg, G. Mori, and J. Malik, "Recognizing Action at a Distance," Proc. Ninth IEEE Int'l Conf. Computer Vision, 2003. [67] B.D. Lucas and T. Kanade, "An Iterative Image Registration Technique with an Application to Stereo Vision," Proc. Seventh Int'l Joint Conf. Artificial Intelligence, 1981. [68] J.C. Niebles, H. Wang, and L. Fei-Fei, "Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words," Int'l J. Computer Vision, vol. 79, no. 3, pp. 299-318, 2008. [69] Q.V. Le, W.Y. Zou, S.Y. Yeung, and A.Y. Ng, "Learning Hierarchical Invariant Spatio-Temporal Features for Action Recognition with Independent Subspace Analysis," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2011. [70] M. Hoai, Z.-Z. Lan, and F. De la Torre, "Joint Segmentation and Classification of Human Actions in Video," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2011.