This Article 
 Bibliographic References 
 Add to: 
Learning to Detect Objects in Images via a Sparse, Part-Based Representation
November 2004 (vol. 26 no. 11)
pp. 1475-1490
Dan Roth, IEEE Computer Society
We study the problem of detecting objects in still, gray-scale images. Our primary focus is the development of a learning-based approach to the problem that makes use of a sparse, part-based representation. A vocabulary of distinctive object parts is automatically constructed from a set of sample images of the object class of interest; images are then represented using parts from this vocabulary, together with spatial relations observed among the parts. Based on this representation, a learning algorithm is used to automatically learn to detect instances of the object class in new images. The approach can be applied to any object with distinguishable parts in a relatively fixed spatial configuration; it is evaluated here on difficult sets of real-world images containing side views of cars, and is seen to successfully detect objects in varying conditions amidst background clutter and mild occlusion. In evaluating object detection approaches, several important methodological issues arise that have not been satisfactorily addressed in previous work. A secondary focus of this paper is to highlight these issues and to develop rigorous evaluation standards for the object detection problem. A critical evaluation of our approach under the proposed standards is presented.

[1] S. Ullman, High-Level Vision: Object Recognition and Visual Cognition. MIT Press, 1996.
[2] I. Biederman, Recognition by Components: A Theory of Human Image Understanding Psychological Rev., vol. 94, pp. 115-147, 1987.
[3] N.K. Logothetis and D.L. Sheinberg, Visual Object Recognition Ann. Rev. of Neuroscience, vol. 19, pp. 577-621, 1996.
[4] S.E. Palmer, Hierarchical Structure in Perceptual Representation Cognitive Psychology, vol. 9, pp. 441-474, 1977.
[5] E. Wachsmuth, M.W. Oram, and D.I. Perrett, Recognition of Objects and Their Component Parts: Responses of Single Units in the Temporal Cortex of the Macaque Cerebral Cortex, vol. 4, pp. 509-522, 1994.
[6] A. Mohan, C. Papageorgiou, and T. Poggio, Example-Based Object Detection in Images by Components IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23, pp. 349-361, 2001.
[7] A.J. Colmenarez and T.S. Huang, Face Detection with Information-Based Maximum Discrimination Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 782-787, 1997.
[8] H. Rowley, S. Baluja, and T. Kanade, "Neural Network-Based Face Detection," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, no. 1, Jan. 1998, pp. 23-38.
[9] E. Osuna, R. Freund, and F. Girosi, Training Support Vector Machines: An Application to Face Detection Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 17-19, 1997.
[10] M. Turk and A. Pentland, Eigenfaces for Recognition J. Cognitive Neuroscience, vol. 3, no. 1, pp. 71-86, 1991.
[11] B. Moghaddam and A. Pentland, "Probabilistic Visual Learning for Object Detection," Int'l Conf. Computer Vision, 1995, pp. 786-793.
[12] Y. Amit and D. Geman, A Computational Model for Visual Selection Neural Computation, vol. 11, no. 7, pp. 1691-1715, 1999.
[13] M-H. Yang, D. Roth, and N. Ahuja, A SNoW-Based Face Detector Advances in Neural Information Processing Systems 12, S.A. Solla, T.K. Leen, and K.-R. Müller, eds., pp. 855-861, 2000.
[14] P. Viola and M. Jones, Rapid Object Detection Using a Boosted Cascade of Simple Features Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2001.
[15] L. Shams and J. Spoeslstra, Learning Gabor-Based Features for Face Detection Proc. World Congress in Neural Networks, Int'l Neural Network Soc., pp. 15-20, 1996.
[16] C. Papageorgiou and T. Poggio, A Trainable System for Object Detection Int'l J. Computer Vision, vol. 38, no. 1, pp. 15-33, 2000.
[17] Y. LeCun, P. Haffner, L. Bottou, and Y. Bengio, Object Recognition with Gradient-Based Learning Feature Grouping, D. Forsyth, ed., 1999.
[18] H. Schneiderman and T. Kanade, "A Statistical Method for 3D Object Detection Applied to Faces and Cars," Proc. IEEE Computer Vision and Pattern Recognition (CVPR 00), IEEE CS Press, 2000, pp. 746—751.
[19] S. Ullman, E. Sali, and M. Vidal-Naquet, A Fragment-Based Approach to Object Representation and Classification Proc. Fourth Int'l Workshop Visual Form, C. Arcelli, L.P. Cordella, and G. Sanniti di Baja, eds., pp. 85-100, 2001.
[20] S. Ullman, M. Vidal-Naquet, and E. Sali, Visual Features of Intermediate Complexity and Their Use in Classification Nature Neuroscience, vol. 5, no. 7, pp. 682-687, 2002.
[21] M. Weber, M. Welling, and P. Perona, Unsupervised Learning of Models for Recognition Proc. Sixth European Conf. Computer Vision, pp. 18-32, 2000.
[22] D. Roth, M-H. Yang, and N. Ahuja, Learning to Recognize Three-Dimensional Objects Neural Computation, vol. 14, no. 5, pp. 1071-1103, 2002.
[23] H.P. Moravec, Towards Automatic Visual Obstacle Avoidance Proc. Fifth Int'l Joint Conf. Artificial Intelligence, 1977.
[24] C. Schmid and R. Mohr, “Local Grayvalue Invariants for Image Retrieval,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 19, no. 5, pp. 530-535, May 1997.
[25] R.M. Haralick and L.G. Shapiro, Computer and Robot Vision II. Addison-Wesley, 1993.
[26] A.J. Carlson, C. Cumby, J. Rosen, and D. Roth, The SNoW Learning Architecture Technical Report UIUCDCS-R-99-2101, Computer Science Dept., Univ. Illinois at Urbana-Champaign, May 1999.
[27] D. Roth, Learning to Resolve Natural Language Ambiguities: A Unified Approach Proc. 15th Nat'l Conf. Artificial Intelligence, pp. 806-813, 1998.
[28] N. Littlestone, Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm Machine Learning, vol. 2, no. 4, pp. 285-318, 1988.
[29] S. Agarwal and D. Roth, Learning a Sparse Representation for Object Detection Proc. Seventh European Conf. Computer Vision, vol. 4, pp. 113-130, 2002.

Index Terms:
Object detection, image representation, machine learning, evaluation/methodology.
Shivani Agarwal, Aatif Awan, Dan Roth, "Learning to Detect Objects in Images via a Sparse, Part-Based Representation," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 26, no. 11, pp. 1475-1490, Nov. 2004, doi:10.1109/TPAMI.2004.108
Usage of this product signifies your acceptance of the Terms of Use.