The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.03 - March (2010 vol.32)
pp: 530-545
M. Pawan Kumar , Stanford University, Stanford
A. Zisserman , University of Oxford, Oxford
ABSTRACT
We present a probabilistic method for segmenting instances of a particular object category within an image. Our approach overcomes the deficiencies of previous segmentation techniques based on traditional grid conditional random fields (CRF), namely that 1) they require the user to provide seed pixels for the foreground and the background and 2) they provide a poor prior for specific shapes due to the small neighborhood size of grid CRF. Specifically, we automatically obtain the pose of the object in a given image instead of relying on manual interaction. Furthermore, we employ a probabilistic model which includes shape potentials for the object to incorporate top-down information that is global across the image, in addition to the grid clique potentials which provide the bottom-up information used in previous approaches. The shape potentials are provided by the pose of the object obtained using an object category model. We represent articulated object categories using a novel layered pictorial structures model. Nonarticulated object categories are modeled using a set of exemplars. These object category models have the advantage that they can handle large intraclass shape, appearance, and spatial variation. We develop an efficient method, OBJCUT, to obtain segmentations using our probabilistic framework. Novel aspects of this method include: 1) efficient algorithms for sampling the object category models of our choice and 2) the observation that a sampling-based approximation of the expected log-likelihood of the model can be increased by a single graph cut. Results are presented on several articulated (e.g., animals) and nonarticulated (e.g., fruits) object categories. We provide a favorable comparison of our method with the state of the art in object category specific image segmentation, specifically the methods of Leibe and Schiele and Schoenemann and Cremers.
INDEX TERMS
Object category specific segmentation, conditional random fields, generalized EM, graph cuts.
CITATION
M. Pawan Kumar, A. Zisserman, "OBJCUT: Efficient Segmentation Using Top-Down and Bottom-Up Cues", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.32, no. 3, pp. 530-545, March 2010, doi:10.1109/TPAMI.2009.16
REFERENCES
[1] A. Agarwal and B. Triggs, “Tracking Articulated Motion Using a Mixture of Autoregressive Models,” Proc. European Conf. Computer Vision, vol. III, pp. 54-65, 2004.
[2] A. Blake, C. Rother, M. Brown, P. Perez, and P.H.S. Torr, “Interactive Image Segmentation Using an Adaptive GMMRF Model,” Proc. European Conf. Computer Vision, vol. I, pp.428-441, 2004.
[3] A. Blake and A. Zisserman, Visual Reconstruction. MIT Press, 1987.
[4] E. Borenstein and S. Ullman, “Class-Specific, Top-Down Segmentation,” Proc. European Conf. Computer Vision, vol. II, pp.109-124, 2002.
[5] Y. Boykov and M.P. Jolly, “Interactive Graph Cuts for Optimal Boundary and Region Segmentation of Objects in N-D Images,” Proc. IEEE Int'l Conf. Computer Vision, vol. I, pp. 105-112, 2001.
[6] J.M. Coughlan, A.L. Yuille, C. English, and D. Snow, “Efficient Optimization of a Deformable Template Using Dynamic Programming,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 747-752, 1998.
[7] D. Cremers, N. Sochen, and C. Schnoerr, “Mutliphase Dynamic Labelling for Variational Recognition-Driven Image Segmentation,” Int'l J. Computer Vision, vol. 66, pp. 67-81, 2006.
[8] P.F. Felzenszwalb, “Representation and Detection of Deformable Shapes,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, vol. I, pp. 102-108, 2003.
[9] P.F. Felzenszwalb and D.P. Huttenlocher, “Efficient Matching of Pictorial Structures,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, vol. II, pp. 66-73, 2000.
[10] P.F. Felzenszwalb and D.P. Huttenlocher, “Fast Algorithms for Large State Space HMMs with Applications to Web Usage Analysis,” Proc. Advances in Neural Information Processing Systems, 2003.
[11] R. Fergus, P. Perona, and A. Zisserman, “Object Class Recognition by Unsupervised Scale Invariant Learning,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, vol. II, pp. 264-271, 2003.
[12] M.A. Fischler and R.A. Elschlager, “The Representation and Matching of Pictorial Structures,” IEEE Trans. Computers, vol. 22, no. 1, pp. 67-92, Jan. 1973.
[13] D. Freedman and T. Zhang, “Interactive Graph Cut Based Segmentation with Shape Priors,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, vol. I, pp. 755-762, 2005.
[14] D.M. Gavrilla, “Pedestrian Detection from a Moving Vehicle,” Proc. European Conf. Computer Vision, vol. II, pp. 37-49, 2000.
[15] A. Gelman, J. Carlin, H. Stern, and D. Rubin, Bayesian Data Analysis. Chapman and Hall, 1995.
[16] S. Geman and D. Geman, “Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 6, no. 6, pp. 721-741, Nov. 1984.
[17] J. Goldstein, J. Platt, and C. Burges, “Redundant Bit Vectors for Quickly Searching High-Dimensional Regions,” Proc. Deterministic and Statistical Methods in Machine Learning, pp.137-158, 2005.
[18] P. Hammer, “Some Network Flow Problems Solved with Pseudo Boolean Programming,” Operations Research, vol. 13, pp. 388-399, 1965.
[19] R. Huang, V. Pavlovic, and D.N. Metaxas, “A Graphical Model Framework for Coupling MRFs and Deformable Models,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, vol. II, pp. 739-746, 2004.
[20] P. Kohli, M.P. Kumar, and P.H.S. Torr, “P3 & Beyond: Solving Energies with Higher Order Cliques,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2007.
[21] V. Kolmogorov and R. Zabih, “What Energy Functions Can Be Minimized via Graph Cuts,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 26, no. 2, pp. 147-159, Feb. 2004.
[22] M.P. Kumar, P.H.S. Torr, and A. Zisserman, “Extending Pictorial Structures for Object Recognition,” Proc. British Machine Vision Conf., vol. II, pp. 789-798, 2004.
[23] M.P. Kumar, P.H.S. Torr, and A. Zisserman, “OBJCUT,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, vol. I, pp. 18-25, 2005.
[24] M.P. Kumar, P.H.S. Torr, and A. Zisserman, “Learning Layered Motion Segmentations of Video,” Int'l J. Computer Vision, vol. 76, no. 3, pp. 301-319, 2008.
[25] J. Lafferty, A. McCallum, and F. Pereira, “Conditional Random Fields: Probabilistic Models for Segmenting and Labelling Sequence Data,” Proc. Int'l Conf. Machine Learning, 2001.
[26] B. Leibe and B. Schiele, “Interleaved Object Categorization and Segmentation,” Proc. British Machine Vision Conf., vol. II, pp. 264-271, 2003.
[27] A. Levin and Y. Weiss, “Learning to Combine Bottom-Up and Top-Down Segmentation,” Proc. European Conf. Computer Vision, vol. IV, pp. 581-594, 2006.
[28] P. Meer and B. Georgescu, “Edge Detection with Embedded Confidence,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23, no. 12, pp. 1351-1365, Dec. 2001.
[29] A. Opelt, A. Pinz, and A. Zisserman, “Incremental Learning of Object Detectors Using a Visual Shape Alphabet,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, vol. I, pp. 3-10, 2006.
[30] M. Prasad, A. Zisserman, A. Fitzgibbon, M.P. Kumar, and P.H.S. Torr, “Learning Class-Specific Edges for Object Detection and Segmentation,” Proc. Indian Conf. Computer Vision, Graphics, and Image Processing, 2006.
[31] D. Ramanan, “Using Segmentation to Verify Object Hypothesis,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2007.
[32] D. Ramanan and D.A. Forsyth, “Using Temporal Coherence to Build Models of Animals,” Proc. IEEE Int'l Conf. Computer Vision, pp. 338-345, 2003.
[33] J. Rihan, P. Kohli, and P.H.S. Torr, “OBJCUT for Face Detection,” Proc. Indian Conf. Computer Vision, Graphics and Image Processing, 2006.
[34] C. Rother, V. Kolmogorov, and A. Blake, “Grabcut: Interactive Foreground Extraction Using Iterated Graph Cuts,” Proc. ACM SIGGRAPH, pp. 309-314, 2004.
[35] T. Schoenemann and D. Cremers, “Globally Optimal Image Segmentation with an Elastic Shape Prior,” Proc. IEEE Int'l Conf. Computer Vision, 2007.
[36] T. Schoenemann and D. Cremers, “Globally Optimal Shape-Based Tracking in Real-Time,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008.
[37] J. Shotton, A. Blake, and R. Cipolla, “Contour-Based Learning for Object Detection,” Proc. IEEE Int'l Conf. Computer Vision, vol. I, pp.503-510, 2005.
[38] J. Shotton, J. Winn, C. Rother, and A. Criminisi, “TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-Class Object Recognition and Segmentation,” Proc. European Conf. Computer Vision, vol. I, pp. 1-15, 2006.
[39] B. Stenger, A. Thayananthan, P.H.S. Torr, and R. Cipolla, “Hand Pose Estimation Using Heirarchical Detection,” Proc. Int'l Workshop Human-Computer Interaction, pp. 105-116, 2004.
[40] A. Thayananthan, B. Stenger, P.H.S. Torr, and R. Cipolla, “Shape Context and Chamfer Matching in Cluttered Scenes,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, vol. I, pp.127-133, 2003.
[41] A. Torralba, K.P. Murphy, and W.T. Freeman, “Sharing Visual Features for Multiclass and Multiview Object Detection,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 29, no. 5, pp.854-869, May 2007.
[42] K. Toyama and A. Blake, “Probabilistic Tracking in a Metric Space,” Proc. IEEE Int'l Conf. Computer Vision, vol. II, pp.50-57, 2001.
[43] M. Varma and A. Zisserman, “Texture Classification: Are Filter Banks Necessary?” Proc. IEEE Conf. Computer Vision and Pattern Recognition, vol. II, pp. 691-698, 2003.
[44] J. Winn and N. Jojic, “LOCUS: Learning Object Classes with Unsupervised Segmentation,” Proc. IEEE Int'l Conf. Computer Vision, vol. I, pp. 756-763, 2005.
[45] J. Winn and J. Shotton, “The Layout Consistent Random Field for Recognizing and Segmenting Partially Occluded Objects,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, vol. I, pp. 37-44, 2006.
[46] J. Yedidia, W. Freeman, and Y. Weiss, “Bethe Free Energy, Kikuchi Approximations, and Belief Propagation Algorithms,” Technical Report TR2001-16, Mitsubishi Electric Research Laboratories (MERL), 2001.
5 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool