The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.10 - Oct. (2013 vol.35)
pp: 2468-2483
Yongmian Zhang , IT Res. Div., Konica Minolta Lab. U.S.A. Inc., San Mateo, CA, USA
Yifan Zhang , Dept. of Electr., Comput. & Syst. Eng., Rensselaer Polytech. Inst., Troy, NY, USA
E. Swears , Dept. of Electr., Comput. & Syst. Eng., Rensselaer Polytech. Inst., Troy, NY, USA
N. Larios , Dept. of Electr., Comput. & Syst. Eng., Rensselaer Polytech. Inst., Troy, NY, USA
Ziheng Wang , Dept. of Electr., Comput. & Syst. Eng., Rensselaer Polytech. Inst., Troy, NY, USA
Qiang Ji , Dept. of Electr., Comput. & Syst. Eng., Rensselaer Polytech. Inst., Troy, NY, USA
ABSTRACT
Complex activities typically consist of multiple primitive events happening in parallel or sequentially over a period of time. Understanding such activities requires recognizing not only each individual event but, more importantly, capturing their spatiotemporal dependencies over different time intervals. Most of the current graphical model-based approaches have several limitations. First, time--sliced graphical models such as hidden Markov models (HMMs) and dynamic Bayesian networks are typically based on points of time and they hence can only capture three temporal relations: precedes, follows, and equals. Second, HMMs are probabilistic finite-state machines that grow exponentially as the number of parallel events increases. Third, other approaches such as syntactic and description-based methods, while rich in modeling temporal relationships, do not have the expressive power to capture uncertainties. To address these issues, we introduce the interval temporal Bayesian network (ITBN), a novel graphical model that combines the Bayesian Network with the interval algebra to explicitly model the temporal dependencies over time intervals. Advanced machine learning methods are introduced to learn the ITBN model structure and parameters. Experimental results show that by reasoning with spatiotemporal dependencies, the proposed model leads to a significantly improved performance when modeling and recognizing complex activities involving both parallel and sequential events.
INDEX TERMS
Hidden Markov models, Bayesian methods, Computational modeling, Probabilistic logic, Uncertainty, Graphical models,interval temporal Bayesian networks, Activity recognition, temporal reasoning, Bayesian networks
CITATION
Yongmian Zhang, Yifan Zhang, E. Swears, N. Larios, Ziheng Wang, Qiang Ji, "Modeling Temporal Interactions with Interval Temporal Bayesian Networks for Complex Activity Recognition", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.35, no. 10, pp. 2468-2483, Oct. 2013, doi:10.1109/TPAMI.2013.33
REFERENCES
[1] P. Turaga, R. Chellappa, V. Subrahmanian, and O. Udrea, "Machine Recognition of Human Activities: A Survey," IEEE Trans. Circuits and Systems for Video Technology, vol. 18, no. 11, pp. 1473-1488, Nov. 2008.
[2] C. Pinhanez, "Representation and Recognition of Action in Interactive Spaces," PhD thesis, MIT Media Lab, 1999.
[3] J. Pearl, Probabilistic Reasoning in Intelligent Systems. Morgan Kaufmann, 1988.
[4] J.F. Allen and G. Ferguson, "Actions and Events in Temporal Logic," J. Logic and Computation, vol. 4, no. 5, pp. 531-579, 1994.
[5] N.M. Oliver, B. Rosario, and A.P. Pentland, "A Bayesian Computer Vision System for Modeling Human Interactions," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp. 831-843, Aug. 2000.
[6] M. Brand, N. Oliver, and A. Pentland, "Coupled Hidden Markov Models for Complex Action Recognition," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 1997.
[7] S. Park and J.K. Aggarwal, "A Hierarchical Bayesian Network for Event Recognition of Human Actions and Interactions," Multimedia Systems, vol. 10, no. 2, pp. 164-179, 2004.
[8] R. Hamid, Y. Huang, and I. Essa, "ARGMode Activity Recognition Using Graphical Models," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2003.
[9] T. Xiang and S. Gong, "Beyond Tracking: Modeling Activity and Understanding Behaviour," Int'l J. Computer Vision, vol. 67, no. 1, pp. 21-51, 2006.
[10] S. Gong and T. Xiang, "Recognition of Group Activities Using Dynamic Probabilistic Networks," Proc. IEEE Int'l Conf. Computer Vision, 2003.
[11] T.V. Duong, H.H. Bui, D.Q. Phung, and S. Venkatesh, "Activity Recognition and Abnormality Detection with the Switching Hidden Semi-Markov Models," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2005.
[12] Y. Shi, A.F. Bobick, and I.A. Essa, "Learning Temporal Sequence Model from Partially Labeled Data," Proc. IEEE Conf. Computer Vision and Pattern Recognition, vol. 2, pp. 1631-1638, 2006.
[13] A. Fernadez-Leal, V. Moret-Bonillo, and E. Mosqueira-Rey, "Causal Temporal Constraint Networks for Representing Temporal Knowledge," Expert Systems with Applications, vol. 36, no. 2009, pp. 27-42, 2009.
[14] R. Nevatia, T. Zhao, and S. Hongeng, "Hierarchical Language-Based Representation of Events in Video Streams," Proc. Second IEEE Workshop Event Mining: Detection and Recognition of Events in Video, 2003.
[15] A. Hakeem, Y. Sheikh, and M. Shah, "CASE: A Hierachical Event Representation for the Analysis of Videos," Proc. 19th Nat'l Conf. Artificial Intelligence, 2004.
[16] F. Fusier, V. Valentin, F. Bremond, M. Thonnat, M. Borg, D. Thirde, and J. Ferryman, "Video Understanding for Complex Activity Recognition," Machine Vision and Applications, vol. 2007, no. 18, pp. 167-188, 2007.
[17] S. Hongeng, R. Nevatia, and F. Bremond, "Video-Based Event Recognition: Activity Representation and Probabilistic Recognition Methods," Computer Vision and Image Understanding, vol. 96, no. 2, pp. 129-162, 2004.
[18] M.S. Ryoo and J.K. Aggarwal, "Spatio-Temporal Relationship Match: Video Structure Comparison for Recognition of Complex Human Activities," Proc. IEEE Int'l Conf. Computer Vision, 2009.
[19] C.F. Aliferis and G.F. Cooper, "A Structurally and Temporally Extended Bayesian Belief Network Model: Definitions, Properties, and Modeling Techniques," Proc. 12th Ann. Conf. Uncertainty in Artificial Intelligence, 1996.
[20] E. SantosJr. and J.D. Young, "Probabilistic Temporal Networks: A Unified Framework for Reasoning with Time and Uncertainty," Int'l J. Approximate Reasoning, vol. 20, pp. 263-291, 1999.
[21] J.D. Young and E. SantosJr, "Introduction to Temporal Bayesian Networks," Proc. Seventh Midwest AI and Cognitive Science Conf., 1996.
[22] S.S. Intille and A.F. Bobick, "Recognizing Planned, Multiperson Action," Computer Vision and Image Understanding, vol. 81, pp. 414-445, 2001.
[23] E. SantosJr., "On the Generation of Alternative Explanations with Implications for Belief Revision," Proc. Seventh Conf. Uncertainty in Artificial Intelligence, pp. 339-347, 1991.
[24] B. Milch and S. Russell, "First-Order Probabilistic Languages: Into the Unknown," Proc. 16th Int'l Conf. Inductive Logic Programming, pp. 10-24, 2007.
[25] V.I. Morariu and L.S. Davis, "Multi-Agent Event Recognition in Structured Scenarios," Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 3289-3296, 2011.
[26] W. Brendel, A. Fern, and S. Todorovic, "Probabilistic Event Logic for Interval-Based Event Recognition," Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 3329-3336, 2011.
[27] S. Sanghai, P. Domingos, and D. Weld, "Relational Dynamic Bayesian Networks," J. Artificial Intelligence Research, vol. 24, no. 2005, pp. 759-797, 2005.
[28] B. Milch, B. Marthi, S. Russell, D. Sontag, D.L. Ong, and A. Kolobov, "BLOG: Probabilistic Models with Unknown Objects," Proc. Int'l Joint Conf. Artificial Intelligence, pp. 1352-1359, 2005.
[29] M. Richardson and P. Domingos, "Markov Logic Networks," Machine Learning, vol. 62, pp. 107-136, Feb. 2006.
[30] J.M. Siskind, "Grounding the Lexical Semantics of Verbs in Visual Perception Using Force Dynamics and Event Logic," J. Artificial Intelligence Research, vol. 15, pp. 31-90, 2001.
[31] M. Albanese, R. Chellappa, V. Moscato, A. Picariello, V.S. Subrahmanian, and P. Turaga, "A Constrained Probabilistic Petri Net Framework for Human Activity Detection in Video," IEEE Trans. Multimedia, vol. 10, no. 8, pp. 1429-1443, Dec. 2008.
[32] R. Hamid, S. Maddi, A. Bobick, and M. Essa, "Structure from Statistics—Unsupervised Activity Analysis Using Suffix Trees," Proc. IEEE Int'l Conf. Computer Vision, 2007.
[33] M.S. Ryoo and J.K. Aggarwal, "Semantic Representation and Recognition of Continued and Recursive Human Activities," Int'l J. Computer Vision, vol. 2009, no. 82, pp. 1-24, 2009.
[34] Y.A. Ivanov and A.F. Bobick, "Recognition of Visual Activities and Interactions by Stochastic Parsing," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp. 852-871, Aug. 2000.
[35] A. Hakeem and M. Shah, "Learning, Detection and Representation of Multi-Agent Events in Videos," Artificial Intelligence, vol. 71, nos. 8/9, pp. 586-605, 2007.
[36] A. Gupta, P. Srinivasan, J. Shi, and L.S. Davis, "Understanding Videos, Constructing Plots—Learning a Visually Grounded Storyline Model from Annotated Videos," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2009.
[37] D. Kuettel, M. Breitenstein, L.V. Gool, and V. Ferrari, "Whats Going On? Discovering Spatio-Temporal Dependencies in Dynamic Scenes," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2010.
[38] T. Hospedales, J. Li, S. Gong, and T. Xiang, "Identifying Rare and Subtle Behaviors: A Weakly Supervised Joint Topic Model," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 33, no. 12, pp. 2451-2464, Dec. 2011.
[39] J.F. Allen, "Maintaining Knowledge about Temporal Intervals," Comm. ACM, vol. 26, no. 11, pp. 832-843, 1983.
[40] G. Schwarz, "Estimating the Dimension of a Model," Annals of Statistics, vol. 6, pp. 461-464, 1978.
[41] C.P. de Campos and Q. Ji, "Efficient Structure Learning of Bayesian Networks Using Constraints," J. Machine Learning Research, vol. 12, pp. 663-689, 2011.
[42] D.G.D. Hecherman and D.M. Chickering, "Learning Bayesian Networks: The Combination of Knowledge and Statistical Data," Machine Learning, vol. 20, pp. 197-243, 1995.
[43] J.D. Ferguson, "Variable Duration Models From Speech," Proc. Symp. Application Hidden Markov Models Text Speech, 1980.
[44] C. Mitchell, M. Harper, and L. Jamieson, "On the Complexity of Explicit Duration HMMs," IEEE Trans. Speech and Audio Processing, vol. 3, no. 3, pp. 213-217, May 1995.
[45] P. Natarajan and R. Nevatia, "Coupled Hidden Semi Markov Models for Activity Recognition," Proc. IEEE Workshop Motion and Video Computing, 2007.
[46] F. Jurie and M. Dhome, "Real Time Robust Template Matching," Proc. British Machine Vision Conf., 2002.
[47] N. Dalal and B. Triggs, "Histograms of Oriented Gradients for Human Detection," Proc. IEEE Conf. Computer Vision and Pattern Recognition, vol. 1, pp. 886-893, 2005.
[48] A. Guenoche, P. Hansen, and B. Jaumard, "Efficient Algorithms for Divisive Hierarchical Clustering with Diameter Criterion," J. Classification, vol. 8, pp. 5-30, 1991.
[49] C. Wang, D. Blei, and F.-F. Li, "Simultaneous Image Classification and Annotation," Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 1903-1910, June 2009.
[50] D. Reid, "An Algorithm for Tracking Multiple Targets," IEEE Trans. Automatic Control, vol. 24, no. 6, pp. 843-854, Dec. 1979.
[51] S. Joo and Q. Zheng, "A Temporal Variance-Based Moving Target Detector," Proc. IEEE Int'l Workshop Performance Evaluation of Tracking and Surveillance, Jan. 2005.
[52] D. Comaniciu, V. Ramesh, and P. Meer, "Kernel-Based Object Tracking," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 25, no. 5, pp. 564-577, May 2003.
[53] A. Perera, C. Srinivas, A. Hoogs, G. Brooksby, and W. Hu, "Multi-Object Tracking through Simultaneous Long Occlusions and Split-Merge Conditions," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2006.
[54] K. Smith, D. Gatica-Perez, J. Odobez, and B. Sileye, "Evaluating Multi-Object Tracking," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2005.
138 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool