The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.07 - July (2013 vol.35)
pp: 1704-1716
Wei-Lwun Lu , Google, Mountain View, CA, USA
J.-A Ting , Robert Bosch Res. & Technol. Center, Palo Alto, CA, USA
J. J. Little , Dept. of Comput. Sci., Univ. of British Columbia, Vancouver, BC, Canada
K. P. Murphy , Google, Mountain View, CA, USA
ABSTRACT
Tracking and identifying players in sports videos filmed with a single pan-tilt-zoom camera has many applications, but it is also a challenging problem. This paper introduces a system that tackles this difficult task. The system possesses the ability to detect and track multiple players, estimates the homography between video frames and the court, and identifies the players. The identification system combines three weak visual cues, and exploits both temporal and mutual exclusion constraints in a Conditional Random Field (CRF). In addition, we propose a novel Linear Programming (LP) Relaxation algorithm for predicting the best player identification in a video clip. In order to reduce the number of labeled training data required to learn the identification system, we make use of weakly supervised learning with the assistance of play-by-play texts. Experiments show promising results in tracking, homography estimation, and identification. Moreover, weakly supervised learning with play-by-play texts greatly reduces the number of labeled training examples required. The identification system can achieve similar accuracies by using merely 200 labels in weakly supervised learning, while a strongly supervised approach needs a least 20,000 labels.
INDEX TERMS
Videos, Image color analysis, Cameras, Visualization, Vectors, Feature extraction, Supervised learning,weakly supervised learning, Sports video analysis, identification, tracking
CITATION
Wei-Lwun Lu, J.-A Ting, J. J. Little, K. P. Murphy, "Learning to Track and Identify Players from Broadcast Sports Videos", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.35, no. 7, pp. 1704-1716, July 2013, doi:10.1109/TPAMI.2012.242
REFERENCES
[1] H. BenShitrit, J. Berclaz, F. Fleuret, and P. Fua, "Tracking Multiple People under Global Appearance Constraints," Proc. IEEE Int'l Computer Vision Conf., 2011.
[2] W.-L. Lu, J.-A. Ting, K.P. Murphy, and J.J. Little, "Identifying Players in Broadcast Sports Videos Using Conditional Random Fields," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2011.
[3] A. Yilmaz and O. Javed, "Object Tracking: A Survey," ACM Computing Surveys, vol. 38, no. 4, p. 13, 2006.
[4] K. Okuma, A. Taleghani, N. de Freitas, J.J. Little, and D.G. Lowe, "A Boosted Particle Filter: Multitarget Detection and Tracking," Proc. European Conf. Computer Vision, 2004.
[5] Y. Cai, N. de Freitas, and J.J. Little, "Robust Visual Tracking for Multiple Targets," Proc. Ninth European Conf. Computer Vision, 2006.
[6] D. Comaniciu, V. Ramesh, and P. Meer, "Kernel-Based Object Tracking," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 25, no. 5, pp. 564-575, May 2003.
[7] M.-C. Hu, M.-H. Chang, J.-L. Wu, and L. Chi, "Robust Camera Calibration and Player Tracking in Broadcast Basketball Video," IEEE Trans. Multimedia, vol. 13, no. 2, pp. 266-279, Apr. 2011.
[8] B. Wu and R. Nevatia, "Detection and Tracking of Multiple, Partially Occluded Humans by Bayesian Combination of Edgelet based Part Detector," Int'l J. Computer Vision, vol. 75, no. 2, pp. 247-266, 2007.
[9] B. Song, T.-Y. Jeng, E. Staudt, and A.K. Roy-Chowdhury, "A Stochastic Graph Evolution Framework for Robust Multi-Target Tracking," Proc. 11th European Conf. Computer Vision, 2010.
[10] B. Yang, C. Huang, and R. Nevatia, "Learning Affinities and Dependencies for Multi-Target Tracking Using a CRF Model," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2011.
[11] H. Jiang, S. Fels, and J.J. Little, "Optimizing Multiple Object Tracking and Best Biew Video Synthesis," IEEE Trans. Multimedia, vol. 10, no. 6, pp. 997-1012, 2008.
[12] J. Liu, X. Tong, W. Li, T. Wang, Y. Zhang, and H. Wang, "Automatic Player Detection, Labeling and Tracking in Broadcast Soccer Video," Pattern Recognition Letters, vol. 30, pp. 103-113, 2009.
[13] L. Ballan, M. Bertini, A.D. Bimbo, and W. Nunziati, "Soccer Players Identification Based on Visual Local Features," Proc. Sixth ACM Int'l Conf. Image and Video Retrieval, 2007.
[14] M. Bertini, A.D. Bimbo, and W. Nunziati, "Player Identification in Soccer Videos," Proc. Seventh ACM SIGMM Int'l Workshop Multimedia Information Retrieval, 2005.
[15] M. Saric, H. Dujmic, V. Papic, and N. Rozic, "Player Number Localization and Recognition in Soccer Video Using HSV Color Space and Internal Contours," Proc. 10th WSEAS Int'l Conf. Automation and Information, 2008.
[16] Q. Ye, Q. Huang, S. Jiang, Y. Liu, and W. Gao, "Jersey Number Detection in Sports Video for Athlete Identification," Proc. SPIE, 2005.
[17] X. Zhu and A.B. Goldberg, Introduction to Semi-Supervised Learning. Morgan & Claypool, 2009.
[18] C. Vondrick, D. Ramanan, and D. Patterson, "Efficiently Scaling Up Video Annotation with Crowdsourced Marketplaces," Proc. 11th European Conf. Computer Vision, 2010.
[19] K. Barnard, P. Duygulu, D. Forsyth, N. de Freitas, D.M. Blei, and M.I. Jordan, "Matching Words and Pictures," J. Machine Learning Research, vol. 3, pp. 1107-1135, 2003.
[20] T. Cour, B. Sapp, C. Jordan, and B. Taskar, "Learning from Ambiguously Labeled Images," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2009.
[21] T. Cour, B. Sapp, A. Nagle, and B. Taskar, "Talking Pictures: Temporal Grouping and Dialog-Supervised Person Recognition," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2010.
[22] M. Everingham, J. Sivic, and A. Zisserman, "'Hello! My name Is... Buffy'—Automatic Naming of Characters in TV Video," Proc. British Machine Vision Conf., 2006.
[23] J. Sivic, M. Everingham, and A. Zisserman, "'Who Are You'— Learning Person Specific Classifiers from Video," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2009.
[24] O. Duchenne, I. Laptev, J. Sivic, F. Bach, and J. Ponce, "Automatic Annotation of Human Actions in Video," Proc. IEEE Int'l Computer Vision Conf., 2009.
[25] I. Laptev, M. Marszalek, C. Schmid, and B. Rozenfeld, "Learning Realistic Human Actions from Movies," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008.
[26] M. Marszalek, I. Laptev, and C. Schmid, "Actions in Context," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2009.
[27] A. Gupta, P. Srinivasan, J. Shi, and L.S. Davis, "Understanding Videos, Constructing Plots: Learning a Visually Grounded Storyline Model from Annotated Videos," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2009.
[28] G. Zhu, C. Xu, Q. Huang, Y. Rui, S. Jiang, W. Gao, and H. Yao, "Event Tactic Analysis Based on Broadcast Sports Video," IEEE Trans. Multimedia, vol. 11, no. 1, pp. 49-66, Jan. 2009.
[29] P. Felzenszwalb, D. McAllester, and D. Ramanan, "A Discriminatively Trained, Multiscale, Deformable Part Model," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008.
[30] L. Rabiner, "A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition," Proc. IEEE, vol. 77, no. 2, pp. 257-286, Feb. 1989.
[31] Tesseract-OCR, http://code.google.com/ptesseract-ocr/, 2012.
[32] C.M. Bishop, Pattern Recognition and Machine Learning. Springer, 2006.
[33] K. Okuma, "Active Exploration of Training Data for Improved Object Detection," PhD dissertation, Univ. of British Columbia, 2012.
[34] R. Kalman, "A New Approach to Linear Filtering and Prediction Problems," J. Basic Eng., vol. 82, no. 1, pp. 35-45, 1960.
[35] J. Matas, O. Chum, M. Urban, and T. Pajdla, "Robust Wide Baseline Stereo from Maximally Stable Extremal Regions," Proc. British Machine Vision Conf., 2002.
[36] D.G. Lowe, "Distinctive Image Features from Scale-Invariant Keypoints," Int'l J. Computer Vision, vol. 60, no. 2, pp. 91-110, 2004.
[37] P.-E. Forssen and D.G. Lowe, "Shape Descriptors for Maximally Stable Extremal Regions," Proc. IEEE Int'l Computer Vision Conf., 2007.
[38] J. Nocedal and S.J. Wright, Numerical Optimization. Springer, 1999.
[39] B. Frey and D. Mackay, "A Revolution: Belief Propagation in Graphs with Cycles," Proc. Advances in Neural Information Processing Systems Conf., 1998.
[40] A.Y. Ng, "Feature Selection, L1 vs. L2 Regularization, and Rotational Invariance," Proc. Advances in Neural Information Processing Systems Conf., 2004.
[41] M. Schmidt, E. van den Berg, M.P. Friedlander, and K.P. Murphy, "Optimizing Costly Functions with Simple Constraints: A Limited-Memory Projected Quasi-Newton Algorithm," Proc. Conf. Artificial Intelligence and Statistics, 2009.
[42] A. Dempster, N. Laird, and D. Rubin, "Maximum Likelihood from Incomplete Data via the EM Algorithm," J. Royal Statistical Soc. Series B, vol. 39, no. 1, pp. 1-38, 1977.
[43] R.Y. Tsai, "A Versatile Camera Calibration Technique for High-Accuracy 3D Machine Vision Metrology Using Off-the-Shelf TV Cameras and Lenses," IEEE J. Robotics and Automation, vol. 3, no. 4, pp. 323-344, Aug. 1987.
[44] Z. Zhang, "A Flexible New Technique for Camera Calibration," IEEE Trans. Pattern Analysis and Machine Inlligence, vol. 22, no. 11, pp. 1330-1334, Nov. 2000.
[45] K. Okuma, J.J. Little, and D.G. Lowe, "Automatic Rectification of Long Image Sequences," Proc. Asian Conf. Computer Vision, 2004.
[46] R. Hess and A. Fern, "Improved Video Registration Using Non-Distinctive Local Image Features," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2007.
[47] A. Gupta, J.J. Little, and R.J. Woodham, "Using Line and Ellipse Features for Rectification of Broadcast Hockey Video," Proc. Conf. Computer and Robot Vision, 2011.
[48] J. Canny, "A Computational Approach to Edge Detection," IEEE Trans. Pattern Analysis and Machine Inlligence, vol. 8, no. 6, pp. 679-698, Nov. 1986.
[49] Z. Zhang, "Iterative Point Matching for Registration of Free-Form Curves and Surfaces," Int'l J. Computer Vision, vol. 13, no. 2, pp. 119-152, 1994.
[50] S.M. Tari, "Automatic Initialization for Broadcast Sports Videos Rectification," master's thesis, Univ. of British Columbia, 2012.
[51] R. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision, second ed. Cambridge Univ. Press, 2003.
23 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool