This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Learning to Detect a Salient Object
February 2011 (vol. 33 no. 2)
pp. 353-367
Tie Liu, Xi'an Jiaotong University, Xi'an and IBM Research-China, Beijing
Zejian Yuan, Xi'an Jiaotong Uinversity, Xi'an
Jian Sun, Microsoft Research Asia, Beijing
Jingdong Wang, Microsoft Research Aisa, Beijing
Nanning Zheng, Xi'an Jiaotong Uinversity, Xi'an
Xiaoou Tang, Chinese University of Hong Kong, Hong Kong
Heung-Yeung Shum, Microsoft, Redmond
In this paper, we study the salient object detection problem for images. We formulate this problem as a binary labeling task where we separate the salient object from the background. We propose a set of novel features, including multiscale contrast, center-surround histogram, and color spatial distribution, to describe a salient object locally, regionally, and globally. A conditional random field is learned to effectively combine these features for salient object detection. Further, we extend the proposed approach to detect a salient object from sequential images by introducing the dynamic salient features. We collected a large image database containing tens of thousands of carefully labeled images by multiple users and a video segment database, and conducted a set of experiments over them to demonstrate the effectiveness of the proposed approach.

[1] A. Santella, M. Agrawala, D. Decarlo, D. Salesin, and M. Cohen, "Gaze-Based Interaction for Semi-Automatic Photo Cropping," Proc. Conf. Human Factors in Computing Systems, pp. 771-780, 2006.
[2] L. Chen, X. Xie, X. Fan, W. Ma, H. Shang, and H. Zhou, "A Visual Attention Mode for Adapting Images on Small Displays," technical report, Microsoft Research Redmond, 2002.
[3] L. Itti, "Models of Bottom-Up and Top-Down Visual Attention," PhD dissertation, California Inst. of Technology Pasadena, 2000.
[4] C. Rother, L. Bordeaux, Y. Hamadi, and A. Blake, "Autocollage," Proc. ACM SIGGRAPH, pp. 847-852, 2006.
[5] V. Navalpakkam and L. Itti, "An Integrated Model of Top-Down and Bottom-Up Attention for Optimizing Detection Speed," Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, pp. 2049-2056, 2006.
[6] U. Rutishauser, D. Walther, C. Koch, and P. Perona, "Is Bottom-Up Attention Useful for Object Recognition?" Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, pp. 37-44, 2004.
[7] D. Walther, L. Itti, M. Riesenhuber, T. Poggio, and C. Koch, "Attentional Selection for Object Recognition—A Gentle Way," Proc. Second Int'l Workshop Biologically Motivated Computer Vision, 2002.
[8] N. Bruce and J. Tsotsos, "Saliency Based on Information Maximization," Advances in Neural Information Processing Systems, pp. 155-162, MIT Press, 2005.
[9] J. Harel, C. Koch, and P. Perona, "Graph-Based Visual Saliency," Advances in Neural Information Processing Systems, pp. 545-552, MIT Press, 2006.
[10] L. Itti and P. Baldi, "Bayesian Surprise Attracts Human Attention," Advances in Neural Information Processing Systems, pp. 547-554, MIT Press, 2005.
[11] L. Itti and P. Baldi, "A Principled Approach to Detecting Surprising Events in Video," Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, pp. 631-637, 2005.
[12] L. Itti and C. Koch, "Computational Modelling of Visual Attention," Neuroscience, vol. 2, no. 3, pp. 194-203, 2001.
[13] L. Itti, C. Koch, and E. Niebur, "A Model of Saliency-Based Visual Attention for Rapid Scene Analysis," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, no. 11, pp. 1254-1259, Nov. 1998.
[14] C. Koch and S. Ullman, "Shifts in Selection in Visual Attention: Toward the Underlying Neural Circuitry," Human Neurobiology, vol. 4, no. 4, pp. 219-227, 1985.
[15] O.L. Meur, O.L. Callet, D. Barba, and D. Thoreau, "A Coherent Computational Approach to Model Bottom-Up Visual Attention," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 28, no. 5, pp. 802-817, May 2006.
[16] J.K. Tsotsos, S.M. Culhane, W.Y.K. Wai, Y.H. Lai, N. Davis, and F. Nuflo, "Modelling Visual Attention via Selective Tuning," Artificial Intelligence, vol. 78, nos. 1/2, pp. 507-545, 1995.
[17] A. Treisman and G. Gelade, "A Feature-Integration Theory of Attention," Cognitive Psychology, vol. 12, no. 1, pp. 97-136, 1980.
[18] T. Judd, K. Ehinger, F. Durand, and A. Torralba, "Learning to Predict Where Humans Look," Proc. IEEE Int'l Conf. Computer Vision, 2009.
[19] A. Levin and Y. Weiss, "Learning to Combine Bottom-Up and Top-Down Segmentation," Proc. European Conf. Computer Vision, pp. 581-594, 2006.
[20] E. Borenstein, E. Sharon, and S. Ullman, "Combining Top-Down and Bottom-Up Segmentation," Proc. Computer Vision and Pattern Recognition Workshop, 2004.
[21] B. Leibe, K. Mikolajczyk, and B. Schiele, "Segmentation Based Multi-Cue Integration for Object Detection," Proc. British Machine Vision Conf., 2006.
[22] Y. Boykov and M.P. Jolly, "Interactive Graph Cuts for Optimal Boundary & Region Segmentation of Objects in N-D Images," Proc. IEEE Int'l Conf. Computer Vision, pp. 105-112, 2001.
[23] C. Rother, V. Kolmogorov, and A. Blake, "Grabcut: Interactive Foreground Extraction Using Iterated Graph Cuts," Proc. ACM SIGGRAPH, pp. 309-314, 2004.
[24] R. Carmi and L. Itti, "Visual Causes versus Correlates of Attentional Selection in Dynamic Scenes," Vision Research, vol. 46, no. 26, pp. 4333-4345, 2006.
[25] Y. Ma and H. Zhang, "A Model of Motion Attention for Video Skimming," Proc. Int'l Conf. Image Processing, pp. 129-132, 2002.
[26] Y. Zhai and M. Shah, "Visual Attention Detection in Video Sequences Using Spatiotemporal Cues," Proc. ACM Multimedia, pp. 815-824, 2006.
[27] A. Bur, P. Wurtz, R.M. Miiri, and H. Hugli, "Dynamic Visual Attention: Competitive versus Motion Priority Scheme," Proc. Int'l Conf. Computer Vision Systems, 2007.
[28] S. Drouin, P. Hbert, and M. Parizeau, "Incremental Discovery of Object Parts in Video Sequences," Proc. IEEE Int'l Conf. Computer Vision, vol. 2, pp. 1754-1761, 2005.
[29] N. Jojic, J. Winn, and L. Zitnick, "Escaping Local Minima through Hierarchical Model Selection: Automatic Object Discovery, Segmentation, and Tracking in Video," Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, vol. 1, pp. 117-124, 2006.
[30] D. Liu and T. Chen, "A Topic-Motion Model for Unsupervised Video Object Discovery," Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, 2007.
[31] J. Lafferty, A. McCallum, and F. Pereira, "Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data," Proc. Int'l Conf. Machine Learning, pp. 282-289, 2001.
[32] A. Blake, C. Rother, M. Brown, P. Perez, and P. Torr, "Interactive Image Segmentation Using an Adaptive GMMRF Model," Proc. European Conf. Computer Vision, pp. 428-441, 2004.
[33] Y. Li, J. Sun, and H.-Y. Shum, "Video Object Cut and Paste," Proc. ACM SIGGRAPH, pp. 595-600, 2007.
[34] X. Bai, J. Wang, D. Simons, and G. Sapiro, "Video Snapcut: Robust Video Object Cutout Using Localized Classifiers," Proc. ACM SIGGRAPH, 2009.
[35] C. Liu, J. Yuen, A.B. Torralba, J. Sivic, and W.T. Freeman, "Sift flow: Dense Correspondence across Different Scenes," Proc. European Conf. Computer Vision, no. 3, pp. 28-42, 2008.
[36] X. Ren, C. Fowlkes, and J. Malik, "Cue Integration for Figure/Ground Labeling," Advances in Neural Information Processing Systems, pp. 1121-1128, MIT Press, 2005.
[37] V. Kolmogorov, "Convergent Tree-Reweighted Message Passing for Energy Minimization," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 28, no. 10, pp. 1568-1583, Oct. 2006.
[38] F. Liu and M. Gleicher, "Region Enhanced Scale-Invariant Saliency Detection," Proc. IEEE Int'l Conf. Multimedia and Expo, pp. 1477-1480, 2006.
[39] Y.-F. Ma and H.-J. Zhang, "Contrast-Based Image Attention Analysis by Using Fuzzy Growing," Proc. Int'l Conf. Multimedia, pp. 374-381, 2003.
[40] F. Porkili, "Integral Histogram: A Fast Way to Extract Histograms in Cartesian Spaces," Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, pp. 829-836, 2005.
[41] N. Dalal and B. Triggs, "Histograms of Oriented Gradients for Human Detection," Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, pp. 886-893, 2005.
[42] L. Elazary and L. Itti, "Interesting Objects Are Visually Salient," J. Vision, vol. 8, pp. 1-15, 2008.
[43] C.H. Lampert, M.B. Blaschko, and T. Hofmann, "Beyond Sliding Windows: Object Localization by Efficient Subwindow Search," Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, pp. 1-8, 2008.
[44] D.R. Martin, C.C. Fowlkes, and J. Malik, "Learning to Detect Natural Image Boundaries Using Local Brightness, Color, and Texture Cues," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 26, no. 5, pp. 530-549, May 2004.
[45] J. Freixenet, X. Munoz, D. Raba, J. Marti, and X. Cufi, "Yet Another Survey on Image Segmentation: Region and Boundary Information Integration," Proc. European Conf. Computer Vision, pp. 408-422, 2002.
[46] J. Shotton, J. Winn, C. Rother, and A. Criminisi, "Textonboost: Joint Appearance, Shape and Context Modeling for Multi-Class Object Recognition and Segmentation," Proc. European Conf. Computer Vision, pp. 1-15, 2006.
[47] F. Liu and M. Gleicher, "Video Retargeting: Automating Pan and Scan," Proc. ACM Multimedia, pp. 241-250, 2006.
[48] D. Comaniciu and P. Meer, "Mean Shift: A Robust Approach toward Feature Space Analysis," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 5, pp. 603-619, May 2002.

Index Terms:
Salient object detection, conditional random field, visual attention, saliency map.
Citation:
Tie Liu, Zejian Yuan, Jian Sun, Jingdong Wang, Nanning Zheng, Xiaoou Tang, Heung-Yeung Shum, "Learning to Detect a Salient Object," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33, no. 2, pp. 353-367, Feb. 2011, doi:10.1109/TPAMI.2010.70
Usage of this product signifies your acceptance of the Terms of Use.