This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Quantifying and Transferring Contextual Information in Object Detection
April 2012 (vol. 34 no. 4)
pp. 762-777
Shaogang Gong, Sch. of Electron. Eng. & Comput. Sci., Queen Mary Univ. of London, London, UK
Wei-Shi Zheng, Sch. of Inf. Sci. & Technol., Sun Yat-sen Univ., Guangzhou, China
Tao Xiang, Sch. of Electron. Eng. & Comput. Sci., Queen Mary Univ. of London, London, UK
Context is critical for reducing the uncertainty in object detection. However, context modeling is challenging because there are often many different types of contextual information coexisting with different degrees of relevance to the detection of target object(s) in different images. It is therefore crucial to devise a context model to automatically quantify and select the most effective contextual information for assisting in detecting the target object. Nevertheless, the diversity of contextual information means that learning a robust context model requires a larger training set than learning the target object appearance model, which may not be available in practice. In this work, a novel context modeling framework is proposed without the need for any prior scene segmentation or context annotation. We formulate a polar geometric context descriptor for representing multiple types of contextual information. In order to quantify context, we propose a new maximum margin context (MMC) model to evaluate and measure the usefulness of contextual information directly and explicitly through a discriminant context inference method. Furthermore, to address the problem of context learning with limited data, we exploit the idea of transfer learning based on the observation that although two categories of objects can have very different visual appearance, there can be similarity in their context and/or the way contextual information helps to distinguish target objects from nontarget objects. To that end, two novel context transfer learning models are proposed which utilize training samples from source object classes to improve the learning of the context model for a target object class based on a joint maximum margin learning framework. Experiments are carried out on PASCAL VOC2005 and VOC2007 data sets, a luggage detection data set extracted from the i-LIDS data set, and a vehicle detection data set extracted from outdoor surveillance footage. Our results validate the effectiveness of the proposed models for quantifying and transferring contextual information, and demonstrate that they outperform related alternative context models.

[1] R. Ando and T. Zhang, "A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data," J. Machine Learning Research, vol. 6, pp. 1817-1853, 2005.
[2] S.Y.-Z. Bao, M. Sun, and S. Savarese, "Toward Coherent Object Detection and Scene Layout Understanding," Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 65-72, 2010.
[3] M. Bar and S. Ullman, "Spatial Context in Recognition," Perception, vol. 25, pp. 343-352, 1993.
[4] K. Barnard, P. Duygulu, D. Forsyth, N. de Freitas, D.M. Blei, and M. Jordan, "Matching Words and Pictures," J. Machine Learning Research, vol. 3, pp. 1107-1135, 2003.
[5] E. Bart and S. Ullman, "Cross-Generalization- Learning Novel Classes from a Single Example by Feature Replacement," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2005.
[6] S. Belongie, J. Malik, and J. Puzicha, "Shape Matching and Object Recognition Using Shape Contexts," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 4, pp. 509-522, Apr. 2002.
[7] I. Biederman, R.J. Mezzanotte, and J.C. Rabinowitz, "Scene Perception: Detecting and Judging Objects Undergoing Relational Violations," Cognitive Psychology, vol. 14, pp. 143-177, 1982.
[8] A. Bosch, A. Zisserman, and X.M. Noz, "Scene Classification via Plsa," Proc. European Conf. Computer Vision, 2006.
[9] P. Carbonetto, N. de Freitas, and K. Barnard, "A Statistical Model for General Contextual Object Recognition," Proc. European Conf. Computer Vision, 2004.
[10] W. Choi, K. Shahid, and S. Savarese, "What Are They Doing?: Collective Activity Classification Using Spatio-Temporal Relationship among People," Proc. 12th IEEE Int'l Conf. Computer Vision Workshops, pp. 1282-1289, 2009.
[11] W. Choi, K. Shahid, and S. Savarese, "Learning Context for Collective Activity Recognition," Proc. IEEE Int'l Conf. Computer Vision and Pattern Recognition, 2011.
[12] N. Dalal and B. Triggs, "Histograms of Oriented Gradients for Human Detection," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2005.
[13] S.K. Divvala, D. Hoiem, J.H. Hays, A.A. Efros, and M. Hebert, "An Empirical Study of Context in Object Detection," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2009.
[14] L. Duan, I.W. Tsang, D. Xu, and S.J. Maybank, "Domain Transfer SVM for Video Concept Detection," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2009.
[15] M. Everingham, "The 2005 Pascal Visual Object Classes Challenge," Proc. Machine Learning Challenges Workshop, 2005.
[16] M. Everingham, L. Van Gool, C. Williams, J. Winn, and A. Zisserman, "The PASCAL Visual Object Classes (VOC) Challenge," Int'l J. Computer Vision, vol. 88, no. 2, pp. 303-338, 2010.
[17] L. Fei-Fei, R. Fergus, and P. Perona, "One-Shot Learning of Object Categories," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 28, no. 4, pp. 594-611, Apr. 2006.
[18] P. Felzenszwalb, R. Girshick, D. McAllester, and D. Ramanan, "Object Detection with Discriminatively Trained Part-Based Models," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 32, no. 9, pp. 1627-1645, Sept. 2010.
[19] C. Fowlkes, S. Belongie, F. Chung, and J. Malik, "Spectral Grouping Using the Nystrom Method," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 26, no. 2, pp. 214-225, Feb. 2004.
[20] C. Galleguillos and S. Belongie, "Context Based Object Categorization: A Critical Survey," Computer Vision and Image Understanding, vol. 114, no. 6, pp. 712-722, 2010.
[21] C. Galleguillos, B. McFee, S. Belongie, and G. Lanckriet, "Multi-Class Object Localization by Combining Local Contextual Interactions," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2010.
[22] C. Galleguillos, A. Rabinovich, and S. Belongie, "Object Categorization Using Co-Occurrence, Location and Appearance," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008.
[23] K. Grauman and T. Darrell, "The Pyramid Match Kernel: Discriminative Classification with Sets of Image Features," Proc. IEEE Int'l Conf. Computer Vision, pp. 1458-1465, 2005.
[24] A. Gupta and L.S. Davis, "Beyond Nouns: Exploiting Prepositions and Comparative Adjectives for Learning Visual Classifier," Proc. European Conf. Computer Vision, 2008.
[25] G. Heitz and D. Koller, "Learning Spatial Context: Using Stuff to Find Things," Proc. European Conf. Computer Vision, 2008.
[26] D. Hoiem, A. Efros, and M. Hebert, "Putting Objects in Perspective," Int'l J. Computer Vision, vol. 80, no. 1, pp. 3-15, 2008.
[27] i-LIDS Team, "Imagery Library for Intelligent Detection Systems (i-LIDS)," Proc. Ann. IEEE Int'l Carnahan Conf. Security Technology, 2006.
[28] S. Kumar and M. Hebert, "A Hierarchical Field Framework for Unified Context-Based Classification," Proc. IEEE Int'l Conf. Computer Vision, 2005.
[29] S. Lazebnik, C. Schmid, and J. Ponce, "Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2006.
[30] K. Mikolajczyk and C. Schmid, "A Performance Evaluation of Local Descriptors," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 27, no. 10, pp. 1615-1630, Oct. 2005.
[31] K. Murphy, A. Torralba, and W. Freeman, "Using the Forest to See the Tree: A Graphical Model Relating Features, Objects and the Scenes," Advances in Neural Information Processing Systems, MIT Press, 2003.
[32] J. Nocedal and S. Wright, Numerical Optimization, second ed. Springer, 2006.
[33] S. Pan and Q. Yang, "A Survey on Transfer Learning," IEEE Trans. Knowledge and Data Eng., vol. 22, no. 10, pp. 1345-1359, Oct. 2010.
[34] S.J. Pan, J.T. Kwok, and Q. Yang, "Transfer Learning via Dimensionality Reduction," Proc. 23rd AAAI Conf. Artificial Intelligence, pp. 677-682, 2008.
[35] S.J. Pan, I.W. Tsang, J.T. Kwok, and Q. Yang, "Domain Adaptation via Transfer Component Analysis," Proc. Int'l Joint Conf. Artificial Intelligence, 2009.
[36] R. Perko and A. Leonardis, "Context Driven Focus of Attention for Object Detection," Proc. Int'l Workshop Attention in Cognitive Systems, 2007.
[37] R. Perko, C. Wojek, B. Schiele, and A. Leonardis, "Probabilistic Combination of Visual Context Based Attention and Object Detection," Proc. Int'l Workshop Attention in Cognitive Systems, 2008.
[38] A. Rabinovich, A. Vedaldi, C. Galleguillos, E. Wiewiora, and S. Belongie, "Objects in Context," Proc. IEEE Int'l Conf. Computer Vision, 2007.
[39] R. Raina, A. Battle, H. Lee, B. Packer, and A.Y. Ng, "Self-Taught Learning: Transfer Learning from Unlabeled Data," Proc. 24th Int'l Conf. Machine Learning, pp. 759-766, 2007.
[40] X. Ren and J. Malik, "Learning a Classification Model for Segmentation," Proc. IEEE Int'l Conf. Computer Vision, 2003.
[41] M. Rohrbach, M. Stark, G. Szarvas, I. Gurevych, and B. Schiele, "What Helps Where and Why? Semantic Relatedness for Knowledge Transfer," Proc. IEEE Int'l Conf. Computer Vision and Pattern Recognition, 2010.
[42] S. Savarese, J. Winn, and A. Criminisi, "Discriminative Object Class Models of Appearance and Shape by Correlatons," Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, 2006.
[43] B. Schölkopf, A.J. Smola, R.C. Williamson, and P.L. Bartlett, "New Support Vector Algorithms," Neural Computation, vol. 12, no. 5, pp. 1207-1245, 2000.
[44] A. Singhal, J. Luo, and W. Zhu, "Probabilistic Spatial Context Models for Scene Content Understanding," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2003.
[45] A. Torralba, "Contextual Priming for Object Detection," Int'l J. Computer Vision, vol. 53, no. 2, pp. 169-191, 2003.
[46] A. Torralba, K. Murphy, W. Freeman, and M. Rubin, "Context-Based Vision System for Place and Object Recognition," Proc. IEEE Int'l Conf. Computer Vision, 2003.
[47] A. Vedaldi, V. Gulshan, M. Varma, and A. Zisserman, "Multiple Kernels for Object Detection," Proc. IEEE Int'l Conf. Computer Vision, 2009.
[48] L. Wolf and S. Bileschi, "A Critical View of Context," Int'l J. Computer Vision, vol. 69, no. 2, pp. 251-261, 2006.
[49] J. Yang, R. Yan, and A.G. Hauptmann, "Cross-Domain Video Concept Detection Using Adaptive Svms," Proc. Int'l Conf. Multimedia, pp. 188-197, 2007.
[50] Y. Zhang and D. Yeung, "A Convex Formulation for Learning Task Relationships in Multi-Task Learning," Proc. 24th Conf. Uncertainty in Artificial Intelligence, 2010.
[51] W.-S. Zheng, S. Gong, and T. Xiang, "Quantifying Contextual Information for Object Detection," Proc. IEEE Int'l Conf. Computer Vision, 2009.
[52] A. Zweig and D. Weinshall, "Exploiting Object Hierarchy: Combining Models from Different Category Levels," Proc. 11th IEEE Int'l Conf. Computer Vision, 2007.

Index Terms:
object detection,learning (artificial intelligence),outdoor surveillance footage,contextual information,object detection,uncertainty reduction,context modeling,robust context model,target object appearance model,polar geometric context descriptor,maximum margin context model,discriminant context inference method,maximum margin learning framework,PASCAL VOC2005,PASCAL VOC2007,luggage detection data set,i-LIDS data set,vehicle detection data set,Context,Context modeling,Object detection,Detectors,Data models,Feature extraction,Kernel,transfer learning.,Context modeling,object detection
Citation:
Shaogang Gong, Wei-Shi Zheng, Tao Xiang, "Quantifying and Transferring Contextual Information in Object Detection," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 4, pp. 762-777, April 2012, doi:10.1109/TPAMI.2011.164
Usage of this product signifies your acceptance of the Terms of Use.