The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.02 - February (2009 vol.31)
pp: 319-336
Rangachar Kasturi , University of South Florida, Tampa
Dmitry Goldgof , University of South Florida, Tampa
Padmanabhan Soundararajan , University of South Florida, Tampa
Vasant Manohar , University of South Florida, Tampa
John Garofolo , National Institute of Standards and Technology, Gaithersburg
Rachel Bowers , National Institute of Standards and Technology, Gaithersburg
Matthew Boonstra , University of South Florida, Tampa
Valentina Korzhova , University of South Florida, Tampa
Jing Zhang , University of South Florida, Tampa
ABSTRACT
Common benchmark data sets, standardized performance metrics, and baseline algorithms have demonstrated considerable impact on research and development in a variety of application domains. These resources provide both consumers and developers of technology with a common framework to objectively compare the performance of different algorithms and algorithmic improvements. In this paper, we present such a framework for evaluating object detection and tracking in video: specifically for face, text, and vehicle objects. This framework includes the source video data, ground-truth annotations (along with guidelines for annotation), performance metrics, evaluation protocols, and tools including scoring software and baseline algorithms. For each detection and tracking task and supported domain, we developed a 50-clip training set and a 50-clip test set. Each data clip is approximately 2.5 minutes long and has been completely spatially/temporally annotated at the I-frame level. Each task/domain, therefore, has an associated annotated corpus of approximately 450,000 frames. The scope of such annotation is unprecedented and was designed to begin to support the necessary quantities of data for robust machine learning approaches, as well as a statistically significant comparison of the performance of algorithms. The goal of this work was to systematically address the challenges of object detection and tracking through a common evaluation framework that permits a meaningful objective comparison of techniques, provides the research community with sufficient data for the exploration of automatic modeling techniques, encourages the incorporation of objective evaluation into the development process, and contributes useful lasting resources of a scale and magnitude that will prove to be extremely useful to the computer vision research community for years to come.
INDEX TERMS
Performance evaluation, object detection and tracking, baseline algorithms, face, text, vehicle.
CITATION
Rangachar Kasturi, Dmitry Goldgof, Padmanabhan Soundararajan, Vasant Manohar, John Garofolo, Rachel Bowers, Matthew Boonstra, Valentina Korzhova, Jing Zhang, "Framework for Performance Evaluation of Face, Text, and Vehicle Detection and Tracking in Video: Data, Metrics, and Protocol", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.31, no. 2, pp. 319-336, February 2009, doi:10.1109/TPAMI.2008.57
REFERENCES
[1] V. Manohar, P. Soundararajan, M. Boonstra, H. Raju, D. Goldgof, R. Kasturi, and J. Garofolo, “Performance Evaluation of Text Detection and Tracking in Video,” Proc. Seventh Int'l Workshop Document Analysis Systems, pp. 576-587, 2006.
[2] V. Manohar, P. Soundararajan, H. Raju, D. Goldgof, R. Kasturi, and J. Garofolo, “Performance Evaluation of Object Detection and Tracking in Video,” Proc. Seventh Asian Conf. Computer Vision, pp.151-161, 2006.
[3] M. Sezgin and B. Sankur, “Survey over Image Thresholding Techniques and Quantitative Performance Evaluation,” J. Electronic Imaging, vol. 13, no. 1, pp. 146-168, 2004.
[4] M.D. Heath, S. Sarkar, T. Sanocki, and K.W. Bowyer, “A Robust Visual Method for Assessing the Relative Performance of Edge-Detection Algorithms,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 19, no. 12, pp. 1338-1359, Dec. 1997.
[5] A. Hoover, G. Jean-Baptiste, X. Jiang, P.J. Flynn, H. Bunke, D. Goldgof, K. Bowyer, D.W. Eggert, A. Fitzgibbon, and R.B. Fisher, “An Experimental Comparison of Range Image Segmentation Algorithms,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 18, no. 7, pp. 673-689, July 1996.
[6] K. Mikolajczyk and C. Schmid, “A Performance Evaluation of Local Descriptors,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 27, no. 10, pp. 1615-1630, Oct. 2005.
[7] S. Sarkar, P. Phillips, Z. Liu, I. Vega, P. Grother, and K. Bowyer, “The HumanID Gait Challenge Problem: Data Sets, Performance, and Analysis,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 27, no. 2, pp. 162-177, Feb. 2005.
[8] C. Wilson, R.A. Hicklin, M. Bone, H. Korves, P. Grother, B. Ulery, R. Micheals, M. Zoepfl, S. Otto, and C. Watson, “Fingerprint Vendor Technology Evaluation 2003: Summary of Results and Analysis Report,” Technical Report NISTIR 7123, Nat'l Inst. Standards and Tech nology, 2004.
[9] R. Cappelli, D. Maio, D. Maltoni, J.L. Wayman, and A.K. Jain, “Performance Evaluation of Fingerprint Verification Systems,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 28, no. 1, pp. 3-18, Jan. 2006.
[10] P.J. Phillips, H. Moon, S.A. Rizvi, and P.J. Rauss, “The FERET Evaluation Methodology for Face Recognition Algorithms,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 10, pp.1090-1104, Oct. 2000.
[11] D.M. Blackburn, M. Bone, and P.J. Phillips, “Facial Recognition Vendor Test 2000: Evaluation Report,” technical report, Nat'l Inst. Standards and Technology, http://www.frvt.org/DLsFRVT_2000.pdf, 2001.
[12] P.J. Phillips, P.J. Flynn, T. Scruggs, K.W. Bowyer, J. Chang, K. Hoffman, J. Marques, J. Min, and W. Worek, “Overview of the Face Recognition Grand Challenge,” Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, vol. 1, pp. 947-954, 2005.
[13] D. Young and J. Ferryman, “PETS Metrics: On-Line Performance Evaluation Service,” Proc. Joint IEEE Int'l Workshop Visual Surveillance and Performance Evaluation of Tracking and Surveillance, pp. 317-324, Oct. 2005.
[14] PETS Metrics, http:/petsmetrics.net, 2008.
[15] Computers in the Human Interaction Loop (CHIL), http:/chil.server. de, 2008.
[16] Augmented Multiparty Interaction (AMI), http:/www.amiproject. org, 2008.
[17] Evaluation du Traitement et de l'Interprétation de Séquences Vidéo (ETISEO), http://www.silogic.fretiseo, 2008.
[18] Cognitive Agent that Learns and Organizes (CALO), http:/caloproject.sri.com, 2008.
[19] NIST Rich Transcription Meeting Recognition Evaluation (RT), http://www.nist.gov/speech/testsrt/, 2008.
[20] Proc. Text REtrieval Conf. VIDeo Retrieval Evaluation (TRECVID), http://www-nlpir.nist.gov/projectstrecvid /, 2008.
[21] J. Garofolo, R.T. Rose, and R. Steifelhagen, “Eval-Ware: Multimodal Interaction,” IEEE Signal Processing Magazine, vol. 24, no. 2, pp. 154-155, 2007.
[22] CLassification of Events, Activities and Relationships (CLEAR), http:/www.clear-evaluation.org, 2008.
[23] V. Manohar, M. Boonstra, V. Korzhova, P. Soundararajan, D. Goldgof, R. Kasturi, S. Prasad, H. Raju, R. Bowers, and J. Garofolo, “PETS versus VACE Evaluation Programs: A Comparative Study,” Proc. Ninth IEEE Int'l Workshop Performance Evaluation of Tracking and Surveillance, pp. 1-6, 2006.
[24] S. Antani, D. Crandall, A. Narasimhamurthy, V.Y. Mariano, and R. Kasturi, “Evaluation of Methods for Detection and Localization of Text in Video,” Proc. Int'l Workshop Document Analysis Systems, pp. 507-514, 2000.
[25] X. Hua, L. Wenyin, and H. Zhang, “An Automatic Performance Evaluation Protocol for Video Text Detection Algorithms,” IEEE Trans. Circuits and Systems for Video Technology, vol. 14, no. 4, pp.498-507, 2004.
[26] J. Nascimento and J. Marques, “Performance Evaluation of Object Detection for Video Surveillance,” IEEE Trans. Multimedia, vol. 8, no. 4, pp. 761-774, 2006.
[27] R.B. Fisher, “The PETS04 Surveillance Ground-Truth Data Sets,” Proc. IEEE Performance Evaluation of Tracking and Surveillance Workshop, May 2004.
[28] R. Collins, X. Zhou, and S.K. Teh, “An Open Source Tracking Testbed and Evaluation Web Site,” Proc. IEEE Int'l Workshop Performance Evaluation of Tracking and Surveillance), pp. 17-24, Jan. 2005.
[29] J. Black, T.J. Ellis, and P. Rosin, “A Novel Method for Video Tracking Performance Evaluation,” Proc. IEEE Performance Evaluation of Tracking and Surveillance Workshop, Oct. 2003.
[30] L.M. Brown, A.W. Senior, Y. Tian, J. Connell, A. Hampapur, C. Shu, H. Merkl, and M. Lu, “Performance Evaluation of Surveillance Systems under Varying Conditions,” Proc. IEEE Performance Evaluation of Tracking and Surveillance Workshop, Jan. 2005.
[31] K. Smith, D. Gatica-Perez, J. Odobez, and S. Ba, “Evaluating Multi-Object Tracking,” Proc. IEEE Empirical Evaluation Methods in Computer Vision Workshop, June 2005.
[32] M. Liberman and C. Cieri, “The Creation, Distribution and Use of Linguistic Data: The Case of the Linguistic Data Consortium,” Proc. First Int'l Conf. Language Resources and Evaluation, 1998.
[33] D. Doermann and D. Mihalcik, “Tools and Techniques for Video Performance Evaluation,” Proc. Int'l Conf. Pattern Recognition, vol. 4, pp. 167-170, 2000.
[34] J.G. Fiscus, J. Ajot, and J. Garofolo, “The Rich Transcription 2007 Meeting Recognition Evaluation,” Proc. Multimodal Technologies for Perception of Humans, Joint Proc. Second Int'l Evaluation Workshop Classification of Events, Activities, and Relationships and the Spring 2007 Rich Transcription Meeting Evaluation, R. Stiefelhagen, R.Bowers, and J. Fiscus, eds., 2007.
[35] J.R. Munkres, “Algorithms for the Assignment and Transportation Problems,” J. SIAM, vol. 5, pp. 32-38, 1957.
[36] M.L. Fredman and R.E. Tarjan, “Fibonacci Heaps and Their Uses in Improved Network Optimization Algorithms,” J. ACM, vol. 34, no. 3, pp. 596-615, July 1987.
[37] C.H. Papadimitriou and K. Steiglitz, Combinatorial Optimization: Algorithms and Complexity. Prentice Hall, 1982.
[38] R.T. Rockafellar, Network Flows and Monotropic Optimization. John Wiley & Sons, 1984.
[39] D.E. Knuth, The Stanford GraphBase: A Platform for Combinatorial Computing. ACM, 1993.
[40] R. Stiefelhagen, K. Bernardin, R. Bowers, J. Garofolo, D. Mostefa, and P. Soundararajan, “The CLEAR 2006 Evaluation,” Multimodal Technologies for Perception of Humans, pp. 1-44, Springer, 2006.
[41] E. Hjelmasa and B. Low, “Face Detection: A Survey,” Computer Vision and Image Understanding, vol. 83, no. 3, pp. 236-274, 2001.
[42] M. Yang, D. Kriegman, and N. Ahuja, “Detecting Faces in Images: A Survey,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 1, pp. 34-58, Jan. 2002.
[43] P. Viola and M. Jones, “Robust Real-Time Object Detection,” Int'l J. Computer Vision, 2002.
[44] R. Lienhart and J. Maydt, “An Extended Set of Haar-Like Features for Rapid Object Detection,” Proc. Int'l Conf. Image Processing, pp.900-903, 2002.
[45] The Intel Open Source Computer Vision Library, http://www.intel. com/technology/computing opencv/, 2008.
[46] J. Canny, “A Computational Approach to Edge Detection,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 8, no. 6, pp.679-698, 1986.
[47] D.C. Montgomery, Design and Analysis of Experiments, sixth ed. John Wiley & Sons, 2005.
[48] K. Jung, K. Kim, and A. Jain, “Text Information Extraction in Images and Video: A Survey,” Pattern Recognition, vol. 37, no. 5, pp. 977-997, 2004.
[49] D. Crandall, S. Antani, and R. Kasturi, “Extraction of Special Effects Caption Text Events from Digital Video,” Int'l J. Document Analysis and Recognition, vol. 5, nos. 2-3, pp. 138-157, Apr. 2003.
[50] Z. Sun, G. Bebis, and R. Miller, “On-Road Vehicle Detection: A Review,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 28, no. 5, pp. 694-711, May 2006.
[51] R. Cucchiara, A. Prati, M. Piccardi, and N. Scarabottolo, “Real-Time Detection of Moving Vehicles,” Proc. 10th Int'l Conf. Image Analysis and Processing, pp. 618-623, 1999.
[52] Z. Zhu, G. Xu, B. Yang, D. Shi, and X. Lin, “VISATRAM: A Real-time Vision System for Automatic Traffic Monitoring,” Image and Vision Computing, vol. 18, no. 10, pp. 781-794, 2000.
[53] Z. Kim and J. Malik, “Fast Vehicle Detection with Probabilistic Feature Grouping and Its Application to Vehicle Tracking,” Proc. Ninth IEEE Int'l Conf. Computer Vision, pp. 524-531, 2003.
[54] M. Taj, E. Maggio, and A. Cavallaro, “Multi-Feature Graph-Based Object Tracking,” Multimodal Technologies for Perception of Humans, pp. 190-199, 2006.
[55] Y. Zhai, P. Berkowitz, A. Miller, K. Shafique, A. Vartak, B. White, and M. Shah, “Multiple Vehicle Tracking in Surveillance Videos,” Multimodal Technologies for Perception of Humans, pp. 200-208, 2006.
[56] W. Abd-Almageed and L. Davis, “Robust Appearance Modeling for Pedestrian and Vehicle Tracking,” Multimodal Technologies for Perception of Humans, pp. 209-215, 2006.
[57] X. Song and R. Nevatia, “Robust Vehicle Blob Tracking with Split/Merge Handling,” Multimodal Technologies for Perception of Humans, pp. 216-222, 2006.
21 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool