Framework for Performance Evaluation of Face, Text, and Vehicle Detection and Tracking in Video: Data, Metrics, and Protocol
Issue No. 02 - February (2009 vol. 31)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TPAMI.2008.57
Vasant Manohar , University of South Florida, Tampa
Dmitry Goldgof , University of South Florida, Tampa
Matthew Boonstra , University of South Florida, Tampa
Rachel Bowers , National Institute of Standards and Technology, Gaithersburg
John Garofolo , National Institute of Standards and Technology, Gaithersburg
Valentina Korzhova , University of South Florida, Tampa
Padmanabhan Soundararajan , University of South Florida, Tampa
Rangachar Kasturi , University of South Florida, Tampa
Jing Zhang , University of South Florida, Tampa
Common benchmark data sets, standardized performance metrics, and baseline algorithms have demonstrated considerable impact on research and development in a variety of application domains. These resources provide both consumers and developers of technology with a common framework to objectively compare the performance of different algorithms and algorithmic improvements. In this paper, we present such a framework for evaluating object detection and tracking in video: specifically for face, text, and vehicle objects. This framework includes the source video data, ground-truth annotations (along with guidelines for annotation), performance metrics, evaluation protocols, and tools including scoring software and baseline algorithms. For each detection and tracking task and supported domain, we developed a 50-clip training set and a 50-clip test set. Each data clip is approximately 2.5 minutes long and has been completely spatially/temporally annotated at the I-frame level. Each task/domain, therefore, has an associated annotated corpus of approximately 450,000 frames. The scope of such annotation is unprecedented and was designed to begin to support the necessary quantities of data for robust machine learning approaches, as well as a statistically significant comparison of the performance of algorithms. The goal of this work was to systematically address the challenges of object detection and tracking through a common evaluation framework that permits a meaningful objective comparison of techniques, provides the research community with sufficient data for the exploration of automatic modeling techniques, encourages the incorporation of objective evaluation into the development process, and contributes useful lasting resources of a scale and magnitude that will prove to be extremely useful to the computer vision research community for years to come.
Performance evaluation, object detection and tracking, baseline algorithms, face, text, vehicle.
Vasant Manohar, Dmitry Goldgof, Matthew Boonstra, Rachel Bowers, John Garofolo, Valentina Korzhova, Padmanabhan Soundararajan, Rangachar Kasturi, Jing Zhang, "Framework for Performance Evaluation of Face, Text, and Vehicle Detection and Tracking in Video: Data, Metrics, and Protocol", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol. 31, no. , pp. 319-336, February 2009, doi:10.1109/TPAMI.2008.57