Issue No. 03 - March (2017 vol. 39)
Jimei Yang , Adobe Research, San Jose, CA
Ming-Hsuan Yang , School of Engineering, University of California, Merced, CA
Top-down visual saliency is an important module of visual attention. In this work, we propose a novel top-down saliency model that jointly learns a Conditional Random Field (CRF) and a visual dictionary. The proposed model incorporates a layered structure from top to bottom: CRF, sparse coding and image patches. With sparse coding as an intermediate layer, CRF is learned in a feature-adaptive manner; meanwhile with CRF as the output layer, the dictionary is learned under structured supervision. For efficient and effective joint learning, we develop a max-margin approach via a stochastic gradient descent algorithm. Experimental results on the Graz-02 and PASCAL VOC datasets show that our model performs favorably against state-of-the-art top-down saliency methods for target object localization. In addition, the dictionary update significantly improves the performance of our model. We demonstrate the merits of the proposed top-down saliency model by applying it to prioritizing object proposals for detection and predicting human fixations.
Visualization, Predictive models, Dictionaries, Computational modeling, Context, Context modeling, Prediction algorithms
J. Yang and M. Yang, "Top-Down Visual Saliency via Joint CRF and Dictionary Learning," in IEEE Transactions on Pattern Analysis & Machine Intelligence, vol. 39, no. 3, pp. 576-588, 2017.