This Article 
 Bibliographic References 
 Add to: 
Intrinsic Dimensionality Predicts the Saliency of Natural Dynamic Scenes
June 2012 (vol. 34 no. 6)
pp. 1080-1091
M. Dorr, Med. Sch., Dept. of Ophthalmology, Harvard Univ., Boston, MA, USA
E. Vig, Inst. for Neuroand Bioinf., Univ. of Lubeck, Lubeck, Germany
T. Martinetz, Inst. for Neuroand Bioinf., Univ. of Lubeck, Lubeck, Germany
E. Barth, Inst. for Neuroand Bioinf., Univ. of Lubeck, Lubeck, Germany
Since visual attention-based computer vision applications have gained popularity, ever more complex, biologically inspired models seem to be needed to predict salient locations (or interest points) in naturalistic scenes. In this paper, we explore how far one can go in predicting eye movements by using only basic signal processing, such as image representations derived from efficient coding principles, and machine learning. To this end, we gradually increase the complexity of a model from simple single-scale saliency maps computed on grayscale videos to spatiotemporal multiscale and multispectral representations. Using a large collection of eye movements on high-resolution videos, supervised learning techniques fine-tune the free parameters whose addition is inevitable with increasing complexity. The proposed model, although very simple, demonstrates significant improvement in predicting salient locations in naturalistic videos over four selected baseline models and two distinct data labeling scenarios.

[1] C. Schmid, R. Mohr, and C. Bauckhage, "Evaluation of Interest Point Detectors," Int'l J. Computer Vision, vol. 37, pp. 151-172, June 2000.
[2] S.J. Dickinson, H.I. Christensen, J.K. Tsotsos, and G. Olofsson, "Active Object Recognition Integrating Attention and Viewpoint Control," Computer Vision and Image Understanding, vol. 67, no. 3, pp. 239-260, 1997.
[3] U. Rutishauser, D. Walther, C. Koch, and P. Perona, "Is Bottom-Up Attention Useful for Object Recognition," Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 37-44, 2004.
[4] W.S. Geisler and J.S. Perry, "A Real-Time Foveated Multiresolution System for Low-Bandwidth Video Communication," Proc. SPIE: Human Vision and Electronic Imaging, B.E. Rogowitz and T.N. Pappas, eds., vol. 3299, pp. 294-305, 1998.
[5] L. Itti, "Automatic Foveation for Video Compression Using a Neurobiological Model of Visual Attention," IEEE Trans. Image Processing, vol. 13, no. 10, pp. 1304-1318, Oct. 2004.
[6] A. Ninassi, O.L. Meur, P.L. Callet, and D. Barba, "Does Where You Gaze on an Image Affect Your Perception of Quality? Applying Visual Attention to Image Quality Metric," Proc. Int'l Conf. Image Processing, pp. 169-172, 2007.
[7] A. Santella, M. Agrawala, D. DeCarlo, D. Salesin, and M. Cohen, "Gaze-Based Interaction for Semi-Automatic Photo Cropping," Proc. SIGCHI Conf. Human Factors in Computing Systems, pp. 771-780, 2006.
[8] T. Serre, L. Wolf, S. Bileschi, M. Riesenhuber, and T. Poggio, "Robust Object Recognition with Cortex-Like Mechanisms," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 29, no. 3, pp. 411-426, Mar. 2007.
[9] T. Liu, Z. Yuan, J. Sun, J. Wang, N. Zheng, X. Tang, and H.-Y. Shum, "Learning to Detect a Salient Object," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 33, no. 2, pp. 353-367, Feb. 2011.
[10] E. Vig, M. Dorr, and E. Barth, "Efficient Visual Coding and the Predictability of Eye Movements on Natural Movies," Spatial Vision, vol. 22, no. 5, pp. 397-408, 2009.
[11] A.L. Yarbus, Eye Movements and Vision. Plenum Press, 1967.
[12] P. Reinagel and A.M. Zador, "Natural Scene Statistics at the Centre of Gaze," Network: Computation in Neural System, vol. 10, pp. 341-350, 1999.
[13] B.W. Tatler, R.J. Baddeley, and I.D. Gilchrist, "Visual Correlates of Fixation Selection: Effects of Scale and Time," Vision Research, vol. 45, pp. 643-659, 2005.
[14] G. Krieger, I. Rentschler, G. Hauske, K. Schill, and C. Zetzsche, "Object and Scene Analysis by Saccadic Eye-Movements: An Investigation with Higher-Order Statistics," Spatial Vision, vol. 13, nos. 2/3, pp. 201-214, 2000.
[15] L. Itti, C. Koch, and E. Niebur, "A Model of Saliency-Based Visual Attention for Rapid Scene Analysis," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, no. 11, pp. 1254-1259, Nov. 1998.
[16] O.L. Meur, P.L. Callet, D. Barba, and D. Thoreau, "A Coherent Computational Approach to Model Bottom-Up Visual Attention," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 28, no. 5, pp. 802-817, May 2006.
[17] N. Bruce and J. Tsotsos, "Saliency Based on Information Maximization," Proc. Advances in Neural Information Processing Systems 18, Y. Weiss, B. Schölkopf, and J. Platt, eds., pp. 155-162, MIT Press, 2006.
[18] D. Gao and N. Vasconcelos, "Decision-Theoretic Saliency: Computational Principles, Biological Plausibility, and Implications for Neurophysiology and Psychophysics," Neural Computation, vol. 21, no. 1, pp. 239-271, 2009.
[19] D. Gao, S. Han, and N. Vasconcelos, "Discriminant Saliency, the Detection of Suspicious Coincidences, and Applications to Visual Recognition," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 31, no. 6, pp. 989-1005, June 2009.
[20] A.M. Treisman and G. Gelade, "A Feature-Integration Theory of Attention," Cognitive Psychology, vol. 12, no. 1, pp. 97-136, 1980.
[21] C. Koch and S. Ullman, "Shifts in Selective Visual Attention: Towards the Underlying Neural Circuitry." Human Neurobiology, vol. 4, no. 4, pp. 219-227, 1985.
[22] C. Siagian and L. Itti, "Rapid Biologically-Inspired Scene Classification Using Features Shared with Visual Attention," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 29, no. 2, pp. 300-312, Feb. 2007.
[23] T. Avraham and M. Lindenbaum, "Esaliency (Extended Saliency): Meaningful Attention Using Stochastic Image Modeling," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 32, no. 4, pp. 693-708, Apr. 2010.
[24] L. Itti and P. Baldi, "Bayesian Surprise Attracts Human Attention," Vision Research, vol. 49, no. 10, pp. 1295-1306, May 2009.
[25] L. Zhang, M.H. Tong, T.K. Marks, H. Shan, and G.W. Cottrell, "Sun: A Bayesian Framework for Saliency Using Natural Statistics," J. Vision, vol. 8, no. 7, pp. 1-20, Dec. 2008.
[26] I. Gkioulekas, G. Evangelopoulos, and P. Maragos, "Spatial Bayesian Surprise for Image Saliency and Quality Assessment," Proc. IEEE Int'l Conf. Image Processing, Sept. 2010.
[27] W. Kienzle, F.A. Wichmann, B. Schölkopf, and M.O. Franz, "A Nonparametric Approach to Bottom-Up Visual Saliency," Proc. Advances in Neural Information Processing Systems, pp. 689-696, 2007.
[28] W. Kienzle, B. Schölkopf, F.A. Wichmann, and M.O. Franz, "How to Find Interesting Locations in Video: A Spatiotemporal Interest Point Detector Learned from Human Eye Movements," Proc. 29th Ann. Symp. German Assoc. for Pattern Recognition, pp. 405-414, 2007.
[29] T. Judd, K. Ehinger, F. Durand, and A. Torralba, "Learning to Predict Where Humans Look," Proc. IEEE Int'l Conf. Computer Vision, 2009.
[30] L. Zhang, M.H. Tong, and G.W. Cottrell, "SUNDAy: Saliency Using Natural Statistics for Dynamic Analysis of Scenes," Proc. 31st Ann. Cognitive Science Conf., 2009.
[31] V. Mahadevan and N. Vasconcelos, "Spatiotemporal Saliency in Dynamic Scenes," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 32, no. 1, pp. 171-177, Jan. 2010.
[32] G. Boccignone, A. Chianese, V. Moscato, and A. Picariello, "Foveated Shot Detection for Video Segmentation," IEEE Trans. Circuits and Systems for Video Technology, vol. 15, no. 3, pp. 365-377, Mar. 2005.
[33] C. Guo and L. Zhang, "A Novel Multiresolution Spatiotemporal Saliency Detection Model and Its Applications in Image and Video Compression," IEEE Trans. Image Processing, vol. 19, no. 1, pp. 185-198, Jan. 2010.
[34] E. Vig, M. Dorr, T. Martinetz, and E. Barth, "A Learned Saliency Predictor for Dynamic Natural Scenes," Proc. Int'l Conf. Artificial Neural Networks, Part III, K. Diamantaras, W. Duch, and L.S. Iliadis, eds., pp. 52-61, 2010.
[35] C. Zetzsche and E. Barth, "Fundamental Limits of Linear Filters in the Visual Processing of Two-Dimensional Signals," Vision Research, vol. 30, pp. 1111-1117, 1990.
[36] C. Zetzsche, E. Barth, and B. Wegmann, "The Importance of Intrinsically Two-Dimensional Image Features in Biological Vision and Picture Coding," Digital Images and Human Vision, A.B. Watson, ed., pp. 109-38, MIT Press, Oct. 1993.
[37] E. Barth, T. Caelli, and C. Zetzsche, "Image Encoding, Labeling, and Reconstruction from Differential Geometry," CVGIP: Graphical Models and Image Processing, vol. 55, no. 6, pp. 428-446, Nov. 1993.
[38] C. Mota and E. Barth, "On the Uniqueness of Curvature Features," Dynamische Perzeption, G. Baratoff and H. Neumann, eds., vol. 9, pp. 175-178, Infix Verlag, 2000.
[39] Handbook of Computer Vision and Applications, B. Jähne, H. Haußecker, and P. Geißler, eds. Academic Press, 1999.
[40] C. Mota, I. Stuke, and E. Barth, "Analytic Solutions for Multiple Motions," Proc. IEEE Int'l Conf. Image Processing, vol. II, pp. 917-20, Oct. 2001.
[41] C. Mota, I. Stuke, and E. Barth, "The Intrinsic Dimension of Multispectral Images," Proc. MICCAI Workshop Biophotonics Imaging for Diagnostics and Treatment, pp. 93-100, 2006.
[42] M. Dorr, T. Martinetz, K. Gegenfurtner, and E. Barth, "Variability of Eye Movements when Viewing Dynamic Natural Scenes," J. Vision, vol. 10, no. 10, pp. 1-17, 2010.
[43] M. Böhme, M. Dorr, C. Krause, T. Martinetz, and E. Barth, "Eye Movement Predictions on Natural Videos," Neurocomputing, vol. 69, nos. 16-18, pp. 1996-2004, 2006.
[44] P.-H. Tseng, R. Carmi, I.G.M. Cameron, D.P. Munoz, and L. Itti, "Quantifying Center Bias of Observers in Free Viewing of Dynamic Natural Scenes," J. Vision, vol. 9, no. 7, pp. 1-16, July 2009.
[45] E. Vig, M. Dorr, T. Martinetz, and E. Barth, "Eye Movements Show Optimal Average Anticipation with Natural Dynamic Scenes," Cognitive Computation, vol. 3, no. 1, pp. 79-88, 2011.
[46] L. Itti and C. Koch, "Feature Combination Strategies for Saliency-Based Visual Attention Systems," J. Electronic Imaging, vol. 10, no. 1, pp. 161-169, Jan. 2001.
[47] B.A. Olshausen and D.J. Field, "Emergence of Simple-Cell Receptive Field Properties by Learning a Sparse Code for Natural Images," Nature, vol. 381, pp. 607-609, 1996.
[48] K. Labusch, E. Barth, and T. Martinetz, "Sparse Coding Neural Gas: Learning of Overcomplete Data Representations," Neurocomputing, vol. 72, nos. 7-9, pp. 1547-1555, 2009.
[49] I. Laptev, "On Space-Time Interest Points," Int'l J. Computer Vision, vol. 64, no. 2, pp. 107-123, 2005.
[50] E. Barth and A.B. Watson, "A Geometric Framework for Nonlinear Visual Coding," Optics Express, vol. 7, no. 4, pp. 155-165, 2000.
[51] L. Elazary and L. Itti, "Interesting Objects Are Visually Salient," J. Vision, vol. 8, no. 3, pp. 1-15, Mar. 2008.

Index Terms:
video signal processing,computer vision,image representation,image resolution,iris recognition,learning (artificial intelligence),natural scenes,data labeling,intrinsic dimensionality,natural dynamic scenes saliency,visual attention-based computer vision applications,biologically inspired models,naturalistic scenes,eye movement prediction,signal processing,image representations,image coding principles,machine learning,single-scale saliency maps,grayscale videos,spatiotemporal multiscale representations,spatiotemporal multispectral representations,high-resolution videos,supervised learning techniques,naturalistic videos,Videos,Computational modeling,Biological system modeling,Visualization,Predictive models,Image color analysis,Feature extraction,interest point detection.,Computational models of vision,video analysis,computer vision,spatiotemporal saliency,eye movement prediction,intrinsic dimension,visual attention
M. Dorr, E. Vig, T. Martinetz, E. Barth, "Intrinsic Dimensionality Predicts the Saliency of Natural Dynamic Scenes," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 6, pp. 1080-1091, June 2012, doi:10.1109/TPAMI.2011.198
Usage of this product signifies your acceptance of the Terms of Use.