This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Tracking-Learning-Detection
July 2012 (vol. 34 no. 7)
pp. 1409-1422
Zdenek Kalal, University of Surrey, Guildford
Krystian Mikolajczyk, University of Surrey, Guildford
Jiri Matas, Czech Technical University, Prague
This paper investigates long-term tracking of unknown objects in a video stream. The object is defined by its location and extent in a single frame. In every frame that follows, the task is to determine the object's location and extent or indicate that the object is not present. We propose a novel tracking framework (TLD) that explicitly decomposes the long-term tracking task into tracking, learning, and detection. The tracker follows the object from frame to frame. The detector localizes all appearances that have been observed so far and corrects the tracker if necessary. The learning estimates the detector's errors and updates it to avoid these errors in the future. We study how to identify the detector's errors and learn from them. We develop a novel learning method (P-N learning) which estimates the errors by a pair of “experts”: 1) P-expert estimates missed detections, and 2) N-expert estimates false alarms. The learning process is modeled as a discrete dynamical system and the conditions under which the learning guarantees improvement are found. We describe our real-time implementation of the TLD framework and the P-N learning. We carry out an extensive quantitative evaluation which shows a significant improvement over state-of-the-art approaches.

[1] A. Blum and T. Mitchell, "Combining Labeled and Unlabeled Data with Co-Training," Proc. 11th Ann. Conf. Computational Learning Theory, pp. 92-100, 1998.
[2] B.D. Lucas and T. Kanade, "An Iterative Image Registration Technique with an Application to Stereo Vision," Proc. Seventh Int'l Joint Conf. Artificial Intelligence, vol. 81, pp. 674-679, 1981.
[3] J. Shi and C. Tomasi, "Good Features to Track," Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, 1994.
[4] P. Sand and S. Teller, "Particle Video: Long-Range Motion Estimation Using Point Trajectories," Int'l J. Computer Vision, vol. 80, no. 1, pp. 72-91, 2008.
[5] L. Wang, W. Hu, and T. Tan, "Recent Developments in Human Motion Analysis," Pattern Recognition, vol. 36, no. 3, pp. 585-601, 2003.
[6] D. Ramanan, D.A. Forsyth, and A. Zisserman, "Tracking People by Learning Their Appearance," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 29, no. 1, pp. 65-81, Jan. 2007.
[7] P. Buehler, M. Everingham, D.P. Huttenlocher, and A. Zisserman, "Long Term Arm and Hand Tracking for Continuous Sign Language TV Broadcasts," Proc. British Machine Vision Conf., 2008.
[8] S. Birchfield, "Elliptical Head Tracking Using Intensity Gradients and Color Histograms," Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, 1998.
[9] M. Isard and A. Blake, "CONDENSATION—Conditional Density Propagation for Visual Tracking," Int'l J. Computer Vision, vol. 29, no. 1, pp. 5-28, 1998.
[10] C. Bibby and I. Reid, "Robust Real-Time Visual Tracking Using Pixel-Wise Posteriors," Proc. 10th European Conf. Computer Vision, 2008.
[11] C. Bibby and I. Reid, "Real-Time Tracking of Multiple Occluding Objects Using Level Sets," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2010.
[12] B.K.P. Horn and B.G. Schunck, "Determining Optical Flow," Artificial Intelligence, vol. 17, nos. 1-3, pp. 185-203, 1981.
[13] T. Brox, A. Bruhn, N. Papenberg, and J. Weickert, "High Accuracy Optical Flow Estimation Based on a Theory for Warping," Proc. European Conf. Computer Vision, pp. 25-36, 2004.
[14] J.L. Barron, D.J. Fleet, and S.S. Beauchemin, "Performance of Optical Flow Techniques," Int'l J. Computer Vision, vol. 12, no. 1, pp. 43-77, 1994.
[15] D. Comaniciu, V. Ramesh, and P. Meer, "Kernel-Based Object Tracking," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 25, no. 5, pp. 564-577, May 2003.
[16] I. Matthews, T. Ishikawa, and S. Baker, "The Template Update Problem," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 26, no. 6, pp. 810-815, June 2004.
[17] N. Dowson and R. Bowden, "Simultaneous Modeling and Tracking (SMAT) of Feature Sets," Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, 2005.
[18] A. Rahimi, L.P. Morency, and T. Darrell, "Reducing Drift in Differential Tracking," Computer Vision and Image Understanding, vol. 109, no. 2, pp. 97-111, 2008.
[19] A.D. Jepson, D.J. Fleet, and T.F. El-Maraghi, "Robust Online Appearance Models for Visual Tracking," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 25, no. 10, pp. 1296-1311, Oct. 2003.
[20] A. Adam, E. Rivlin, and I. Shimshoni, "Robust Fragments-Based Tracking Using the Integral Histogram," Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, pp. 798-805, http://ieeexplore.ieee.org/lpdocs/epic03 wrapper.htm?arnumber= 1640835, 2006.
[21] M.J. Black and A.D. Jepson, "Eigentracking: Robust Matching and Tracking of Articulated Objects Using a View-Based Representation," Int'l J. Computer Vision, vol. 26, no. 1, pp. 63-84, 1998.
[22] D. Ross, J. Lim, R. Lin, and M. Yang, "Incremental Learning for Robust Visual Tracking," Int'l J. Computer Vision, vol. 77, nos. 1-3, pp. 125-141, http://www.springerlink.com/index/10.1007 s11263-007-0075-7, Aug. 2007.
[23] J. Kwon and K.M. Lee, "Visual Tracking Decomposition," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2010.
[24] M. Yang, Y. Wu, and G. Hua, "Context-Aware Visual Tracking," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 31, no. 7, pp. 1195-1209, http://www.ncbi.nlm.nih.gov/pubmed19443919 , July 2009.
[25] H. Grabner, J. Matas, L. Van Gool, and P. Cattin, "Tracking the Invisible: Learning Where the Object Might Be," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2010.
[26] S. Avidan, "Support Vector Tracking," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 26, no. 8, pp. 1064-1072, Aug. 2004.
[27] R. Collins, Y. Liu, and M. Leordeanu, "Online Selection of Discriminative Tracking Features," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 27, no. 10, pp. 1631-1643, Oct. 2005.
[28] S. Avidan, "Ensemble Tracking," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 29, no. 2, pp. 261-271, Feb. 2007.
[29] H. Grabner and H. Bischof, "On-Line Boosting and Vision," Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, 2006.
[30] B. Babenko, M.-H. Yang, and S. Belongie, "Visual Tracking with Online Multiple Instance Learning," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2009.
[31] H. Grabner, C. Leistner, and H. Bischof, "Semi-Supervised On-Line Boosting for Robust Tracking," Proc. 10th European Conf. Computer Vision, 2008.
[32] F. Tang, S. Brennan, Q. Zhao, H. Tao, and U.C. Santa Cruz, "Co-Tracking Using Semi-Supervised Support Vector Machines," Proc. 11th IEEE Int'l Conf. Computer Vision, pp. 1-8, 2007.
[33] Q. Yu, T.B. Dinh, and G. Medioni, "Online Tracking and Reacquisition Using Co-Trained Generative and Discriminative Trackers," Proc. 10th European Conf. Computer Vision, 2008.
[34] D.G. Lowe, "Distinctive Image Features from Scale-Invariant Keypoints," Int'l J. Computer Vision, vol. 60, no. 2, pp. 91-110, 2004.
[35] P. Viola and M. Jones, "Rapid Object Detection Using a Boosted Cascade of Simple Features," Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, 2001.
[36] V. Lepetit, P. Lagger, and P. Fua, "Randomized Trees for Real-Time Keypoint Recognition," Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, 2005.
[37] L. Vacchetti, V. Lepetit, and P. Fua, "Stable Real-Time 3D Tracking Using Online and Offline Information," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 26, no. 10, pp. 1385-1391, http://academic.research.microsoft.com/Paper 1428415.aspx, Oct. 2004.
[38] S. Taylor and T. Drummond, "Multiple Target Localisation at over 100 FPS," Proc. British Machine Vision Conf., 2009.
[39] J. Pilet and H. Saito, "Virtually Augmenting Hundreds of Real Pictures: An Approach Based on Learning, Retrieval, and Tracking," Proc. IEEE Virtual Reality Conf., pp. 71-78, http://ieeexplore.ieee.org/lpdocs/epic03 wrapper.htm?arnumber= 5444811, Mar. 2010.
[40] S. Obdrzalek and J. Matas, "Sub-Linear Indexing for Large Scale Object Recognition," Proc. 16th British Machine Vision Conf., vol. 1, pp. 1-10, 2005.
[41] S. Hinterstoisser, O. Kutter, N. Navab, P. Fua, and V. Lepetit, "Real-Time Learning of Accurate Patch Rectification," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2009.
[42] O. Chapelle, B. Schölkopf, and A. Zien, Semi-Supervised Learning. MIT Press, 2006.
[43] X. Zhu and A.B. Goldberg, Introduction to Semi-Supervised Learning. Morgan and Claypool Publishers, 2009.
[44] K. Nigam, A.K. McCallum, S. Thrun, and T. Mitchell, "Text Classification from Labeled and Unlabeled Documents Using EM," Machine Learning, vol. 39, no. 2/3, pp. 103-134, 2000.
[45] R. Fergus, P. Perona, and A. Zisserman, "Object Class Recognition by Unsupervised Scale-Invariant Learning," Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, vol. 2, pp. 264-271, 2003.
[46] C. Rosenberg, M. Hebert, and H. Schneiderman, "Semi-Supervised Self-Training of Object Detection Models," Proc. Seventh IEEE Workshop Application of Computer Vision, 2005.
[47] N. Poh, R. Wong, J. Kittler, and F. Roli, "Challenges and Research Directions for Adaptive Biometric Recognition Systems," Proc. Third Int'l Conf. Advances in Biometrics, 2009.
[48] A. Levin, P. Viola, and Y. Freund, "Unsupervised Improvement of Visual Detectors Using Co-Training," Proc. Ninth IEEE Int'l Conf. Computer Vision, 2003.
[49] O. Javed, S. Ali, and M. Shah, "Online Detection and Classification of Moving Objects Using Progressively Improving Detectors," Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, 2005.
[50] O. Williams, A. Blake, and R. Cipolla, "Sparse Bayesian Learning for Efficient Visual Tracking," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 27, no. 8, pp. 1292-1304, http://www.computer.org/portal/web/csdl/ doi/10.1109TPAMI. 2005.167, Aug. 2005.
[51] M. Isard and A. Blake, "CONDENSATION—Conditional Density Propagation for Visual Tracking," Int'l J. Computer Vision, vol. 29, no. 1, pp. 5-28, 1998.
[52] Y. Li, H. Ai, T. Yamashita, S. Lao, and M. Kawade, "Tracking in Low Frame Rate Video: A Cascade Particle Filter with Discriminative Observers of Different Lifespans," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2007.
[53] K. Okuma, A. Taleghani, N. de Freitas, J.J. Little, and D.G. Lowe, "A Boosted Particle Filter: Multitarget Detection and Tracking," Proc. European Conf. Computer Vision, 2004.
[54] B. Leibe, K. Schindler, and L. Van Gool, "Coupled Detection and Trajectory Estimation for Multi-Object Tracking," Proc. 11th IEEE Int'l Conf. Computer Vision, pp. 1-8, http://ieeexplore.ieee.org/lpdocs/epic03 wrapper.htm?arnumber=4408936, Oct. 2007.
[55] M.D. Breitenstein, F. Reichlin, B. Leibe, E. Koller-Meier, and L.V. Gool, "Robust Tracking-by-Detection Using a Detector Confidence Particle Filter," Proc. IEEE 12th Int'l Conf. Computer Vision, 2009.
[56] K.K. Sung and T. Poggio, "Example-Based Learning for View-Based Human Face Detection," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, no. 1, pp. 39-51, Jan. 1998.
[57] K. Zhou, J.C. Doyle, and K. Glover, Robust and Optimal Control. Prentice Hall, 1996.
[58] K. Ogata, Modern Control Engineering. Prentice Hall, 2009.
[59] Z. Kalal, J. Matas, and K. Mikolajczyk, "P-N Learning: Bootstrapping Binary Classifiers by Structural Constraints," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2010.
[60] V. Lepetit and P. Fua, "Keypoint Recognition Using Randomized Trees," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 28, no. 9, pp. 1465-1479, http://www.ncbi.nlm.nih.gov/pubmed16929732 , Sept. 2006.
[61] M. Ozuysal, P. Fua, and V. Lepetit, "Fast Keypoint Recognition in Ten Lines of Code," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2007.
[62] M. Calonder, V. Lepetit, and P. Fua, "BRIEF: Binary Robust Independent Elementary Features," Proc. European Conf. Computer Vision, 2010.
[63] L. Breiman, "Random Forests," Machine Learning, vol. 45, no. 1, pp. 5-32, 2001.
[64] Z. Kalal, K. Mikolajczyk, and J. Matas, "Forward-Backward Error: Automatic Detection of Tracking Failures," Proc. 20th Int'l Conf. Pattern Recognition, pp. 23-26, 2010.
[65] J.Y. Bouguet, "Pyramidal Implementation of the Lucas Kanade Feature Tracker Description of the Algorithm," technical report, Intel Microprocessor Research Labs, 1999.
[66] Z. Kalal, J. Matas, and K. Mikolajczyk, "Online Learning of Robust Object Detectors during Unstable Tracking," Proc. On-Line Learning for Computer Vision Workshop, 2009.
[67] J. Santner, C. Leistner, A. Saffari, T. Pock, and H. Bischof, "PROST: Parallel Robust Online Simple Tracking," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2010.
[68] A. Saffari, C. Leistner, J. Santner, M. Godec, and H. Bischof, "On-Line Random Forests," Proc. Online Learning for Computer Vision Workshop, 2009.
[69] S. Stalder, H. Grabner, and L.V. Gool, "Beyond Semi-Supervised Tracking: Tracking Should Be as Simple as Detection, but Not Simpler than Recognition," Proc. 12th IEEE Int'l Conf. Computer Vision Workshops, pp. 1409-1416, http://ieeexplore.ieee.org/lpdocs/epic03 wrapper.htm?arnumber=5457445, Sept. 2009.

Index Terms:
Long-term tracking, learning from video, bootstrapping, real time, semi-supervised learning.
Citation:
Zdenek Kalal, Krystian Mikolajczyk, Jiri Matas, "Tracking-Learning-Detection," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 7, pp. 1409-1422, July 2012, doi:10.1109/TPAMI.2011.239
Usage of this product signifies your acceptance of the Terms of Use.