Issue No. 06 - June (2012 vol. 34)
M. Dorr , Med. Sch., Dept. of Ophthalmology, Harvard Univ., Boston, MA, USA
E. Vig , Inst. for Neuroand Bioinf., Univ. of Lubeck, Lubeck, Germany
T. Martinetz , Inst. for Neuroand Bioinf., Univ. of Lubeck, Lubeck, Germany
E. Barth , Inst. for Neuroand Bioinf., Univ. of Lubeck, Lubeck, Germany
Since visual attention-based computer vision applications have gained popularity, ever more complex, biologically inspired models seem to be needed to predict salient locations (or interest points) in naturalistic scenes. In this paper, we explore how far one can go in predicting eye movements by using only basic signal processing, such as image representations derived from efficient coding principles, and machine learning. To this end, we gradually increase the complexity of a model from simple single-scale saliency maps computed on grayscale videos to spatiotemporal multiscale and multispectral representations. Using a large collection of eye movements on high-resolution videos, supervised learning techniques fine-tune the free parameters whose addition is inevitable with increasing complexity. The proposed model, although very simple, demonstrates significant improvement in predicting salient locations in naturalistic videos over four selected baseline models and two distinct data labeling scenarios.
video signal processing, computer vision, image representation, image resolution, iris recognition, learning (artificial intelligence), natural scenes, data labeling, intrinsic dimensionality, natural dynamic scenes saliency, visual attention-based computer vision applications, biologically inspired models, naturalistic scenes, eye movement prediction, signal processing, image representations, image coding principles, machine learning, single-scale saliency maps, grayscale videos, spatiotemporal multiscale representations, spatiotemporal multispectral representations, high-resolution videos, supervised learning techniques, naturalistic videos, Videos, Computational modeling, Biological system modeling, Visualization, Predictive models, Image color analysis, Feature extraction, interest point detection., Computational models of vision, video analysis, computer vision, spatiotemporal saliency, eye movement prediction, intrinsic dimension, visual attention
M. Dorr, E. Vig, T. Martinetz, E. Barth, "Intrinsic Dimensionality Predicts the Saliency of Natural Dynamic Scenes", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol. 34, no. , pp. 1080-1091, June 2012, doi:10.1109/TPAMI.2011.198