CSDL Home IEEE Transactions on Pattern Analysis & Machine Intelligence 2012 vol.34 Issue No.09 - Sept.

Subscribe

Issue No.09 - Sept. (2012 vol.34)

pp: 1842-1855

M. Andreetto , Google Los Angeles (US-LAX-BIN), Venice, CA, USA

L. Zelnik-Manor , Dept. of Electr. Eng., Technion - Israel Inst. of Technol., Haifa, Israel

P. Perona , Dept. of Electr. Eng., California Inst. of Technol., Pasadena, CA, USA

ABSTRACT

Which one comes first: segmentation or recognition? We propose a unified framework for carrying out the two simultaneously and without supervision. The framework combines a flexible probabilistic model, for representing the shape and appearance of each segment, with the popular “bag of visual words” model for recognition. If applied to a collection of images, our framework can simultaneously discover the segments of each image and the correspondence between such segments, without supervision. Such recurring segments may be thought of as the “parts” of corresponding objects that appear multiple times in the image collection. Thus, the model may be used for learning new categories, detecting/classifying objects, and segmenting images, without using expensive human annotation.

INDEX TERMS

Image segmentation, Probabilistic logic, Visualization, Shape, Image recognition, Pattern analysis, scene analysis., Computer vision, image segmentation, unsupervised object recognition, graphical models, density estimation

CITATION

M. Andreetto, L. Zelnik-Manor, P. Perona, "Unsupervised Learning of Categorical Segments in Image Collections",

*IEEE Transactions on Pattern Analysis & Machine Intelligence*, vol.34, no. 9, pp. 1842-1855, Sept. 2012, doi:10.1109/TPAMI.2011.268REFERENCES

- [1] D. Marr,
Vision: A Computational Investigation into the Human Representation and Processing of Visual Information. Henry Holt and Co., Inc., 1982.- [2] J. Malik, S. Belongie, T. Leung, and J. Shi, "Contour and Texture Analysis for Image Segmentation,"
Int'l J. Computer Vision vol. 43, no. 1, pp. 7-27, 2001.- [3] B.C. Russell, A.A. Efros, J. Sivic, W.T. Freeman, and A. Zisserman, "Using Multiple Segmentations to Discover Objects and Their Extent in Image Collections,"
Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2006.- [4] L. Cao and L. Fei-Fei, "Spatially Coherent Latent Topic Model for Concurrent Object Segmentation and Classification,"
Proc. 11th IEEE Int'l Conf. Computer Vision, 2007.- [5] B. Leibe, A. Leonardis, and B. Schiele, "Combined Object Categorization and Segmentation with an Implicit Shape Model,"
Proc. Workshop Statistical Learning in Computer Vision, pp. 17-32, May 2004.- [6] E. Borenstein and S. Ullman, "Class-Specific, Top-Down Segmentation,"
Proc. Seventh European Conf. Computer Vision-Part II, pp. 109-124, 2002.- [7] M. Weber, M. Welling, and P. Perona, "Unsupervised Learning of Models for Recognition,"
Proc. Sixth European Conf. Computer Vision-Part I, pp. 18-32, 2000.- [8] P. Viola and M.J. Jones, "Robust Real-Time Face Detection,"
Int'l J. Computer Vision, vol. 57, no. 2, pp. 137-154, 2004.- [9] D.G. Lowe, "Distinctive Image Features from Scale-Invariant Keypoints,"
Int'l J. Computer Vision, vol. 60, no. 2, pp. 91-110, 2004.- [10] R. Fergus, P. Perona, and A. Zisserman, "Object Class Recognition by Unsupervised Scale-Invariant Learning,"
Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2003.- [11] L. Fei-Fei and P. Perona, "A Bayesian Hierarchical Model for Learning Natural Scene Categories,"
Proc. IEEE Conf. Computer Vision and Pattern Recognition, vol. 2, pp. 524-531, 2005.- [12] T. Leung and J. Malik, "Representing and Recognizing the Visual Appearance of Materials Using Three-Dimensional Textons,"
Int'l J. Computer Vision, vol. 43, no. 1, pp. 29-44, 2001.- [13] D.M. Blei, A.Y. Ng, and M.I. Jordan, "Latent Dirichlet Allocation,"
J. Machine Learning Research, vol. 3, pp. 993-1022, 2003.- [14] M. Vidal-Naquet and S. Ullman, "Object Recognition with Informative Features and Linear Classification,"
Proc. Ninth IEEE Int'l Conf. Computer Vision, pp. 281-288, 2003.- [15] G. Dorkó and C. Schmid, "Selection of Scale-Invariant Parts for Object Class Recognition,"
Proc. Ninth IEEE Int'l Conf. Computer Vision, pp. 634-639, 2003.- [16] A. Rabinovich, A. Vedaldi, C. Galleguillos, E. Wiewiora, and S. Belongie, "Objects in Context,"
Proc. 11th IEEE Int'l Conf. Computer Vision, 2007.- [17] X. Wang and E. Grimson, "Spatial Latent Dirichlet Allocation,"
Proc. Advances in Neural Information Processing Systems, 2007.- [18] Z. Tu and S.-C. Zhu, "Image Segmentation by Data-Driven Markov Chain Monte Carlo,"
IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 5, pp. 657-673, May 2002.- [19] P. Orbanz and J.M. Buhmann, "Nonparametric Bayesian Image Segmentation,"
Int'l J. Computer Vision, vol. 77, pp. 25-45, 2007.- [20] J. Shi and J. Malik, "Normalized Cuts and Image Segmentation,"
IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp. 888-905, Aug. 2000.- [21] N. Ahuja and S. Todorovic, "Learning the Taxonomy and Models of Categories Present in Arbitrary Images,"
Proc. 11th IEEE Int'l Conf. Computer Vision, 2007.- [22] M. Andreetto, L. Zelnik-Manor, and P. Perona, "Non-Parametric Probabilistic Image Segmentation,"
Proc. 11th IEEE Int'l Conf. Computer Vision, 2007.- [23] M.I. Jordan, "Graphical Model,"
Statistical Science, vol. 19, no. 1, pp. 140-155, 2004.- [24] D. Comaniciu and P. Meer, "Mean Shift: A Robust Approach Toward Feature Space Analysis,"
IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 5, pp. 603-619, May 2002.- [25] R.O. Duda, P.E. Hart, and D.G. Stork,
Pattern Classification, second ed. Wiley-Interscience, 2000.- [26] A. Ng, M.I. Jordan, and Y. Weiss, "On Spectral Clustering: Analysis and an Algorithm,"
Proc. Advances in Neural Information Processing Systems, 2001.- [27] C. Carson, S. Belongie, H. Greenspan, and J. Malik, "Blobworld: Image Segmentation Using Expectation-Maximization and Its Application to Image Querying,"
IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 8, pp. 1026-1038, Aug. 2002.- [28] R. Kannan, S. Vempala, and A. Vetta, "On Clusterings: Good, Bad and Spectral,"
J. ACM, vol. 51, no. 3, pp. 497-515, 2004.- [29] R. Zass and A. Shashua, "A Unifying Approach to Hard and Probabilistic Clustering,"
Proc. 10th IEEE Int'l Conf. Computer Vision , vol. 1, pp. 294-301, Oct. 2005.- [30] M. Meila and J. Shi, "Learning Segmentation by Random Walks,"
Proc. Advances in Neural Information Processing Systems, pp. 873-879, 2000.- [31] S.X. Yu and J. Shi, "Segmentation Given Partial Grouping Constraints,"
IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 26, no. 2, pp. 173-183, Feb. 2004.- [32] L. Wasserman,
All of Nonparametric Statistics. Springer, 2006.- [33] L. Zelnik-Manor and P. Perona, "Self-Tuning Spectral Clustering,"
Proc. Advances in Neural Information Processing Systems, pp. 1601-1608, 2005.- [34] T. Brox, B. Rosenhahn, D. Cremers, and H.-P. Seidel, "Nonparametric Density Estimation with Adaptive, Anisotropic Kernels for Human Motion Tracking,"
Proc. Workshop Human Motion in IEEE Int'l Conf. Computer Vision, pp. 152-165, 2007.- [35] T. Cour, F. Benezit, and J. Shi, "Spectral Segmentation with Multiscale Graph Decomposition,"
Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, vol. 2, pp. 1124-1131, 2005.- [36] M.D. Escobar and M. West, "Bayesian Density Estimation and Inference. Using Mixtures,"
J. Am. Statistical Assoc., vol. 90, pp. 577-588, 1995.- [37] C.M. Bishop,
Neural Networks for Pattern Recognition. Oxford Univ. Press, 1995.- [38] C.R. Genovese and L. Wasserman, "Rates of Convergence for the Gaussian Mixture Sieve,"
Annals of Statistics, vol. 28, no. 4, pp. 1105-1127, 2000.- [39] Y.W. Teh, M.I. Jordan, M.J. Beal, and D.M. Blei, "Hierarchical Dirichlet Processes,"
J. Am. Statistical Assoc., vol. 101, no. 476, pp. 1566-1581, Dec. 2006.- [40] A. Lee, D. Mumford, and J. Huang, "Occlusion Models for Natural Images: A Statistical Study of a Scale-Invariant Dead Leaves Model,"
Int'l J. Computer Vision, vol. 41, nos. 1/2, pp. 7-27, 2001.- [41] E.B. Sudderth and M.I. Jordan, "Shared Segmentation of Natural Scenes Using Dependent Pitman-yor Processes,"
Proc. Advances in Neural Information Processing Systems, 2008.- [42] R. Casella,
Monte Carlo Statistical Methods. Springer, 1999.- [43] S. Lazebnik, C. Schmid, and J. Ponce, "A Maximum Entropy Framework for Part-Based Texture and Object Recognition,"
Proc. 10th IEEE Int'l Conf. Computer Vision, vol. 1, pp. 832-838, Oct. 2005.- [44] R. Jin, C. Ding, and F. Kang, "A Probabilistic Approach for Optimizing Spectral Clustering,"
Proc. Advances in Neural Information Processing Systems, 2005.- [45] C. Van Rijsbergen,
Information Retrieval, second ed. Butterworth, 1979.- [46] J. Sivic, B.C. Russell, A.A. Efros, A. Zisserman, and W.T. Freeman, "Discovering Objects and Their Location in Images,"
Proc. 10th IEEE Int'l Conf. Computer Vision, 2005.- [47] E.B. Sudderth, A. Torralba, W.T. Freeman, and A.S. Willsky, "Learning Hierarchical Models of Scenes, Objects, and Parts,"
Proc. 10th IEEE Int'l Conf. Computer Vision, pp. 1331-1338, 2005.- [48] J. Winn, A. Criminisi, and T. Minka, "Object Categorization by Learned Universal Visual Dictionary,"
Proc. 10th IEEE Int'l Conf. Computer Vision, 2005.- [49] A. Criminisi, "Microsoft Research Cambridge Object Recognition Image Database, Version 1.0," 2004.
- [50] A. Oliva and A. Torralba, "Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope,"
Int'l J. Computer Vision, vol. 42, pp. 145-175, 2001.- [51] M. Everingham, L. Van Gool, C.K.I. Williams, J. Winn, and A. Zisserman, "The PASCAL Visual Object Classes Challenge 2009 (VOC 2009) Results," http://www.pascal-network.org/challenges/ VOC/voc2009/workshopindex.html. 2012.
- [52] J. Verbeek and B. Triggs, "Region Classification with Markov Field Aspect Models,"
Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 1-8, June 2007.- [53] J. Shotton, J.M. Winn, C. Rother, and A. Criminisi, "Textonboost for Image Understanding: Multi-Class Object Recognition and Segmentation by Jointly Modeling Texture, Layout, and Context,"
Int'l J. Computer Vision, vol. 81, no. 1, pp. 2-23, 2009.- [54] J. Shotton, M. Johnson, and R. Cipolla, "Semantic Texton Forests for Image Categorization and Segmentation,"
Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 1-8, June 2008.- [55] X. Ren and J. Malik, "Learning a Classification Model for Segmentation,"
Proc. Ninth IEEE Int'l Conf. Computer Vision, pp. 10-17, 2003.- [56] M. Andreetto, "Unsupervised Learning of Categorical Segments in Image Collections," PhD thesis, Dept. of Electrical Eng., California Inst. of Tech nology, 2011.
- [57] G.C.G. Wei and M.A. Tanner, "A Monte Carlo Implementation of the EM Algorithm and the Poor Man's Data Augmentation Algorithms,"
J. Am. Statistical Assoc., vol. 85, no. 411, pp. 699-704, 1990. |