The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.12 - Dec. (2013 vol.35)
pp: 2854-2865
Ali Farhadi , Dept. of Comput. Sci. & Eng., Univ. of Washington, Seattle, WA, USA
Mohammad Amin Sadeghi , Dept. of Comput. Sci., Univ. of Illinois at Urbana-Champaign, Urbana, IL, USA
ABSTRACT
In this paper, we introduce visual phrases, complex visual composites like "a person riding a horse." Visual phrases often display significantly reduced visual complexity compared to their component objects because the appearance of those objects can change profoundly when they participate in relations. We introduce a dataset suitable for phrasal recognition that uses familiar PASCAL object categories, and demonstrate significant experimental gains resulting from exploiting visual phrases. We show that a visual phrase detector significantly outperforms a baseline which detects component objects and reasons about relations, even though visual phrase training sets tend to be smaller than those for objects. We argue that any multiclass detection system must decode detector outputs to produce final results; this is usually done with nonmaximum suppression. We describe a novel decoding procedure that can account accurately for local context without solving difficult inference problems. We show this decoding procedure outperforms the state of the art. Finally, we show that decoding a combination of phrasal and object detectors produces real improvements in detector results.
INDEX TERMS
Data visualization, Detectors, Decoding, Object recognition, Image processing, Complexity theory,object subcategories, Visual phrase, phrasal recognition, visual composites, object recognition, object interactions, scene understanding, single image activity recognition
CITATION
Ali Farhadi, Mohammad Amin Sadeghi, "Phrasal Recognition", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.35, no. 12, pp. 2854-2865, Dec. 2013, doi:10.1109/TPAMI.2013.168
REFERENCES
[1] Y. Amit and A. Trouvé, "Pop: Patchwork of Parts Models for Object Recognition," Int'l J. Computer Vision, vol. 75, pp. 267-282, 2007.
[2] C. Desai, D. Ramanan, and C. Fowlkes, "Discriminative Models for Multi-Class Object Layout," Proc. IEEE Int'l Conf. Computer Vision, 2010.
[3] T.F. Cootes, G.J. Edwards, and C.J. Taylor, "Active Appearance Models," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23, no. 6, pp. 681-685, June 2001.
[4] J. Coughlan, A. Yuille, C. English, and D. Snow, "Efficient Optimization of a Deformable Template Using Dynamic Programming," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 1998.
[5] D. Crandall, P. Felzenszwalb, and D. Huttenlocher, "Spatial Priors for Part-Based Recognition Using Statistical Models," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2005.
[6] M. Everingham, L. Van Gool, C.K.I. Williams, J. Winn, and A. Zisserman, "The PASCAL Visual Object Classes (VOC) Challenge," Int'l J. Computer Vision, vol. 88, no. 2, pp. 303-338, 2010.
[7] A. Farhadi, S.M.M. Hejrati, M.A. Sadeghi, P. Young, C. Rashtchian, J. Hockenmaier, and D.A. Forsyth, "Every Picture Tells a Story: Generating Sentences from Images," Proc. European Conf. Computer Vision, 2010.
[8] P.F. Felzenszwalb, R.B. Girshick, and D. McAllester, "Discriminatively Trained Deformable Part Models, Release 4," http://people.cs.uchicago.edu/pfflatent-release4 /, 2013.
[9] P.F. Felzenszwalb, R.B. Girshick, D. McAllester, and D. Ramanan, "Object Detection with Discriminatively Trained Part-Based Models," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 32, no. 9, pp. 1627-1645, Sept. 2010.
[10] R. Fergus, P. Perona, and A. Zisserman, "Object Class Recognition by Unsupervised Scale-Invariant Learning," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2003.
[11] A. Gupta and L.S. Davis, "Beyond Nouns: Exploiting Prepositions and Comparative Adjectives for Learning Visual Classifiers," Proc. European Conf. Computer Vision, 2008.
[12] P. Koehn, Statistical Machine Translation. Cambridge Univ. Press, 2010.
[13] S. Lazebnik, C. Schmid, and J. Ponce, "Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2006.
[14] N. Loeff and A. Farhadi, "Scene Discovery by Matrix Factorization," Proc. European Conf. Computer Vision, 2008.
[15] A. Oliva and A. Torralba, "Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope," Int'l J. Computer Vision, vol. 42, no.3, pp. 145-175, May/June 2001.
[16] C. Li, D. Parikh, and T. Chen, "Automatic Discovery of Groups of Objects for Scene Understanding," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2012.
[17] Y. Yang, A. Kannan, S. Baker, and D. Ramanan, "Recognizing Proxemics in Personal Photos," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2012.
[18] B. Yao and L. Fei-Fei, "Modeling Mutual Context of Object and Human Pose in Human-Object Interaction Activities," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2010.
[19] A. Gupta, A. Kembhavi, and L. Davis, "Observing Human-Object Interactions: Using Spatial and Functional Compatibility for Recognition," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 31, no. 10, pp. 1775-1789, Oct. 2009.
[20] C. Desai, D. Ramanan, and C. Fowlkes, "Discriminative Models for Static Human-Object Interactions," Proc. IEEE Conf. Computer Vision and Pattern Recognition Workshop Statistical Models in Computer Vision, 2010.
[21] W. Yang, Y. Wang, and G. Mori, "Recognizing Human Actions from Still Images with Latent Poses," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2010.
[22] V. Delaitre, I. Laptev, and J. Sivic, "Recognizing Human Actions in Still Images: A Study of Bag-of-Features and Part-Based Representations," Proc. British Machine Vision Conf., 2010.
[23] S. Maji, L. Bourdev, and J. Malik, "Action Recognition from a Distributed Representation of Pose and Appearance," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2011.
[24] A. Prest, C. Schmid, and V. Ferrari, "Weakly Supervised Learning of Interaction between Humans and Objects," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 34, no. 3, pp. 601-614, Mar. 2012.
[25] H. Pirsiavash and D. Ramanan, "Detecting Activities of Daily Living in First-Person Camera Views," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2012.
[26] C. Desai and D. Ramanan, "Detecting Actions, Poses, and Objects with Relational Phraselets," Proc. European Conf. Computer Vision, 2012.
[27] X. Li, C. Snoek, M. Worring, and A. Smeulders, "Harvesting Social Images for Bi-Concept Search," IEEE Trans. Multimedia, vol. 14, no. 4, pp. 1091-1104, Aug. 2012.
[28] C. Li, D. Parikh, and T. Chen, "Extracting Adaptive Contextual Cues from Unlabeled Regions," Proc. IEEE Int'l Conf. Computer Vision, 2011.
[29] J. Deng, A. Berg, K. Li, and L. Fei-Fei, "What Does Classifying More Than 10,000 Image Categories Tell Us?" Proc. European Conf. Computer Vision, 2010.
[30] X. Zhu, C. Vondrick, D. Ramanan, and C. Fowlkes, "Do We Need More Training Data or Better Models for Object Detection?" Proc. British Machine Vision Conf., 2012.
[31] Y. Aytar and A. Zisserman, "Tabula Rasa: Model Transfer for Object Category Detection," Proc. IEEE Int'l Conf. Computer Vision, 2011.
[32] J. Lim, R. Salakhutdinov, and A. Torralba, "Transfer Learning by Borrowing Examples for Multiclass Object Detection," Proc. Conf. Neural Information Processing Systems, 2011.
[33] D. Hoiem, Y. Chodpathumwan, and Q. Dai, "Diagnosing Error in Object Detectors," Proc. European Conf. Computer Vision, 2012.
[34] S. Singh, A. Gupta, and A. Efros, "Unsupervised Discovery of Mid-Level Discriminative Patches," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2012.
[35] Y. Ushiku, T. Harada, and Y. Kuniyoshi, "Efficient Image Annotation for Automatic Sentence Generation," Proc. 20th ACM Int'l Conf. Multimedia, 2012.
[36] F. Siyahjani and G. Doretto, "Learning a Context Aware Dictionary for Sparse Representation," Proc. Asian Conf. Computer Vision, 2012.
[37] H. Bilen, V. Namboodiri, and L. Van Gool, "Classification with Global, Local and Shared Features," Proc. DAGM-OAGM Joint Pattern Recognition Symp., 2012.
[38] Y. Feng and M. Lapata, "Automatic Caption Generation for News Images," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 34, no. 4, pp. 797-812, Apr. 2012.
[39] S.K. Divvala, A. Efros, and M. Hebert, "How Important Are Deformable Parts in the Deformable Parts Model?" Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2012.
[40] D. Park, D. Ramanan, and C. Fowlkes, "Multiresolution Models for Object Detection," Proc. 11th European Conf. Computer Vision: Part IV, 2010.
[41] L. Bourdev, S. Maji, T. Brox, and J. Malik, "Detecting People Using Mutually Consistent Poselet Activations," Proc. European Conf. Computer Vision, 2010.
[42] O. Chum and A. Zisserman, "An Exemplar Model for Learning Object Classes," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2007.
[43] C. Gu and X. Ren, "Discriminative Mixture-of-Templates for Viewpoint Classification," Proc. 11th European Conf. Computer Vision: Part V, 2010.
[44] J. Deng, W. Dong, R. Socher, L.J. Li, K. Li, and F.-F. Li, "ImageNet: A Large-Scale Hierarchical Image Database," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2009.
[45] T. Deselaers and V. Ferrari, "Visual and Semantic Similarity in ImageNet," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2011.
[46] A. Gupta, A. Kembhavi, and L. Davis, "Observing Human-Object Interactions: Using Spatial and Functional Compatibility for Recognition," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 31, no. 10, pp. 1775-1789, Oct. 2009.
[47] W. Yang, Y. Wang, and G. Mori, "Recognizing Human Actions from Still Images with Latent Poses," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2011.
[48] V. Delaitre, I. Laptev, and J. Sivic, "Recognizing Human Actions in Still Images: A Study of Bag-of-Features and Part-Based Representations," Proc. British Machine Vision Conf., 2010.
[49] S. Maji, L.D. Bourdev, and J. Malik, "Action Recognition from a Distributed Representation of Pose and Appearance," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2011.
[50] M.A. Sadeghi and A. Farhadi, "Recognition Using Visual Phrases," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2011.
86 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool