The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.12 - December (2011 vol.33)
pp: 2368-2382
Ce Liu , Microsoft, Cambridge
Jenny Yuen , MIT, Cambridge
Antonio Torralba , MIT, Cambridge
ABSTRACT
While there has been a lot of recent work on object recognition and image understanding, the focus has been on carefully establishing mathematical models for images, scenes, and objects. In this paper, we propose a novel, nonparametric approach for object recognition and scene parsing using a new technology we name label transfer. For an input image, our system first retrieves its nearest neighbors from a large database containing fully annotated images. Then, the system establishes dense correspondences between the input image and each of the nearest neighbors using the dense SIFT flow algorithm [28], which aligns two images based on local image structures. Finally, based on the dense scene correspondences obtained from SIFT flow, our system warps the existing annotations and integrates multiple cues in a Markov random field framework to segment and recognize the query image. Promising experimental results have been achieved by our nonparametric scene parsing system on challenging databases. Compared to existing object recognition approaches that require training classifiers or appearance models for each object category, our system is easy to implement, has few parameters, and embeds contextual information naturally in the retrieval/alignment procedure.
INDEX TERMS
Object recognition, scene parsing, label transfer, SIFT flow, Markov random fields.
CITATION
Ce Liu, Jenny Yuen, Antonio Torralba, "Nonparametric Scene Parsing via Label Transfer", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.33, no. 12, pp. 2368-2382, December 2011, doi:10.1109/TPAMI.2011.131
REFERENCES
[1] E.H. Adelson, "On Seeing Stuff: The Perception of Materials by Humans and Machines," Proc. SPIE, vol. 4299, pp. 1-12, 2001.
[2] S. Belongie, J. Malik, and J. Puzicha, "Shape Context: A New Descriptor for Shape Matching and Object Recognition," Proc. Advances in Neural Information Processing Systems, 2000.
[3] A. Berg, T. Berg, and J. Malik, "Shape Matching and Object Recognition Using Low Distortion Correspondence," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2005.
[4] I. Borg and P. Groenen, Modern Multidimensional Scaling: Theory and Applications, second ed. Springer-Verlag, 2005.
[5] S. Branson, C. Wah, B. Babenko, F. Schroff, P. Welinder, P. Perona, and S. Belongie, "Visual Recognition with Humans in the Loop," Proc. European Conf. Computer Vision, 2010.
[6] M.J. Choi, J.J. Lim, A. Torralba, and A. Willsky, "Exploiting Hierarchical Context on a Large Database of Object Categories," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2010.
[7] D. Crandall, P. Felzenszwalb, and D. Huttenlocher, "Spatial Priors for Part-Based Recognition Using Statistical Models," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2005.
[8] N. Dalal and B. Triggs, "Histograms of Oriented Gradients for Human Detection," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2005.
[9] C. Desai, D. Ramanan, and C. Fowlkes, "Discriminative Models for Multi-Class Object Layout," Proc. IEEE Int'l Conf. Computer Vision, 2009.
[10] S.K. Divvala, D. Hoiem, J.H. Hays, A.A. Efros, and M. Hebert, "An Empirical Study of Context in Object Detection," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2009.
[11] G. Edwards, T. Cootes, and C. Taylor, "Face Recognition Using Active Appearance Models," Proc. European Conf. Computer Vision, 1998.
[12] A.A. Efros and T. Leung, "Texture Synthesis by Non-Parametric Sampling," Proc. IEEE Int'l Conf. Computer Vision, 1999.
[13] P. Felzenszwalb, D. McAllester, and D. Ramanan, "A Discriminatively Trained, Multiscale, Deformable Part Model," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008.
[14] P. Felzenszwalb and D. Huttenlocher, "Pictorial Structures for Object Recognition," Int'l J. Computer Vision, vol. 61, no. 1, pp. 55-79, 2005.
[15] R. Fergus, P. Perona, and A. Zisserman, "Object Class Recognition by Unsupervised Scale-Invariant Learning," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2003.
[16] R. Fergus, P. Perona, and A. Zisserman, "Object Class Recognition by Unsupervised Scale-Invariant Learning," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2003.
[17] A. Frome, Y. Singer, and J. Malik, "Image Retrieval and Classification Using Local Distance Functions," Proc. Advances in Neural Information Processing Systems, 2006.
[18] C. Galleguillos, B. McFee, S. Belongie, and G.R.G. Lanckriet, "Multi-Class Object Localization by Combining Local Contextual Interactions," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2010.
[19] K. Grauman and T. Darrell, "Pyramid Match Kernels: Discriminative Classification with Sets of Image Features," Proc. IEEE Int'l Conf. Computer Vision, 2005.
[20] A. Gupta and L.S. Davis, "Beyond Nouns: Exploiting Prepositions and Comparative Adjectives for Learning Visual Classifiers," Proc. European Conf. Computer Vision, 2008.
[21] J. Hays and A.A. Efros, "Scene Completion Using Millions of Photographs," ACM Trans. Graphics, vol. 26, no. 3, 2007.
[22] G. Heitz and D. Koller, "Learning Spatial Context: Using Stuff to Find Things," Proc. European Conf. Computer Vision, 2008.
[23] D. Hoiem, A. Efros, and M. Hebert, "Putting Objects in Perspective," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2006.
[24] S. Lazebnik, C. Schmid, and J. Ponce, "Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories," Proc. IEEE Conf. Computer Vision and Pattern Recognition, vol. 2, pp. 2169-2178, 2006.
[25] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-Based Learning Applied to Document Recognition," Proc. IEEE, vol. 86, no. 11, pp. 2278-2324, Nov. 1998.
[26] L. Liang, C. Liu, Y.Q. Xu, B.N. Guo, and H.Y. Shum, "Real-Time Texture Synthesis by Patch-Based Sampling," ACM Trans. Graphics, vol. 20, no. 3, pp. 127-150, July 2001.
[27] C. Liu, J. Yuen, and A. Torralba, "Nonparametric Scene Parsing: Label Transfer via Dense Scene Alignment," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2009.
[28] C. Liu, J. Yuen, and A. Torralba, "SIFT Flow: Dense Correspondence across Different Scenes and Its Applications," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 33, no. 5, pp. 978-994, May 2011.
[29] C. Liu, J. Yuen, A. Torralba, J. Sivic, and W.T. Freeman, "SIFT Flow: Dense Correspondence across Different Scenes," Proc. European Conf. Computer Vision, 2008.
[30] D.G. Lowe, "Distinctive Image Features from Scale-Invariant Keypoints," Int'l J. Computer Vision, vol. 60, no. 2, pp. 91-110, 2004.
[31] K.P. Murphy, A. Torralba, and W.T. Freeman, "Using the Forest to See the Trees: A Graphical Model Relating Features, Objects, and Scenes," Proc. Advances in Neural Information Processing Systems, 2003.
[32] D. Nister and H. Stewenius, "Scalable Recognition with a Vocabulary Tree," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2006.
[33] S. Obdrzalek and J. Matas, "Sub-Linear Indexing for Large Scale Object Recognition," Proc. British Machine Vision Conf., 2005.
[34] A. Oliva and A. Torralba, "Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope," Int'l J. Computer Vision, vol. 42, no. 3, pp. 145-175, 2001.
[35] D. Park, D. Ramanan, and C. Fowlkes, "Multiresolution Models for Object Detection," Proc. European Conf. Computer Vision, 2010.
[36] A. Rabinovich, A. Vedaldi, C. Galleguillos, E. Wiewiora, and S. Belongie, "Objects in Context," Proc. IEEE Int'l Conf. Computer Vision, 2007.
[37] B.C. Russell, A.A. Efros, J. Sivic, W.T. Freeman, and A. Zisserman, "Segmenting Scenes by Matching Image Composites," Proc. Advances in Neural Information Processing Systems, 2009.
[38] B.C. Russell, A. Torralba, C. Liu, R. Fergus, and W.T. Freeman, "Object Recognition by Scene Alignment," Proc. Advances in Neural Information Processing Systems, 2007.
[39] B.C. Russell, A. Torralba, K.P. Murphy, and W.T. Freeman, "LabelMe: A Database and Web-Based Tool for Image Annotation," Int'l J. Computer Vision, vol. 77, nos. 1-3, pp. 157-173, 2008.
[40] S. Savarese, J. Winn, and A. Criminisi, "Discriminative Object Class Models of Appearance and Shape by Correlatons," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2006.
[41] G. Shakhnarovich, P. Viola, and T. Darrell, "Fast Pose Estimation with Parameter Sensitive Hashing," Proc. IEEE Int'l Conf. Computer Vision, 2003.
[42] E. Shechtman and M. Irani, "Matching Local Self-Similarities across Images and Videos," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2007.
[43] J. Shotton, J. Winn, C. Rother, and A. Criminisi, "Textonboost for Image Understanding: Multi-Class Object Recognition and Segmentation by Jointly Modeling Texture, Layout, and Context," Int'l J. Computer Vision, vol. 81, no. 1, pp. 2-23, 2009.
[44] J. Sivic and A. Zisserman, "Video Google: A Text Retrieval Approach to Object Matching in Videos," Proc. IEEE Int'l Conf. Computer Vision, 2003.
[45] E. Sudderth, A. Torralba, W.T. Freeman, and W. Willsky, "Describing Visual Scenes Using Transformed Dirichlet Processes," Proc. Advances in Neural Information Processing Systems, 2005.
[46] J. Tighe and S. Lazebnik, "Superparsing: Scalable Nonparametric Image Parsing with Superpixels," Proc. European Conf. Computer Vision, 2010.
[47] A. Torralba, R. Fergus, and W.T. Freeman, "80 Million Tiny Images: A Large Dataset for Non-Parametric Object and Scene Recognition," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 30, no. 11, pp. 1958-1970, Nov. 2008.
[48] M. Turk and A. Pentland, "Face Recognition Using Eigenfaces," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 1991.
[49] P. Viola and M. Jones, "Rapid Object Detection Using a Boosted Cascade of Simple Features," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2001.
[50] M. Weber, M. Welling, and P. Perona, "Unsupervised Learning of Models for Recognition," Proc. European Conf. Computer Vision, 2000.
[51] J. Winn, A. Criminisi, and T. Minka, "Object Categorization by Learned Universal Visual Dictionary," Proc. IEEE Int'l Conf. Computer Vision, 2005.
[52] J. Xiao, J. Hays, K. Ehinger, A. Oliva, and A. Torralba, "SUN Database: Large-Scale Scene Recognition from Abbey to Zoo," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2010.
[53] Y. Yang, S. Hallman, D. Ramanan, and C. Fowlkes, "Layered Object Detection for Multi-Class Segmentation," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2010.
5 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool