This Article 
 Bibliographic References 
 Add to: 
Attention-Based Dynamic Visual Search Using Inner-Scene Similarity: Algorithms and Bounds
February 2006 (vol. 28 no. 2)
pp. 251-264
A visual search is required when applying a recognition process on a scene containing multiple objects. In such cases, we would like to avoid an exhaustive sequential search. This work proposes a dynamic visual search framework based mainly on inner-scene similarity. Given a number of candidates (e.g., subimages), we hypothesize is that more visually similar candidates are more likely to have the same identity. We use this assumption for determining the order of attention. Both deterministic and stochastic approaches, relying on this hypothesis, are considered. Under the deterministic approach, we suggest a measure similar to Kolmogorov's epsilon-covering that quantifies the difficulty of a search task. We show that this measure bounds the performance of all search algorithms and suggest a simple algorithm that meets this bound. Under the stochastic approach, we model the identity of the candidates as a set of correlated random variables and derive a search procedure based on linear estimation. Several experiments are presented in which the statistical characteristics, search algorithm, and bound are evaluated and verified.

[1] T. Avraham and M. Lindenbaum, “Dynamic Visual Search Using Inner Scene Similarity— Algorithms and Bounds,” Technical Report CIS-2003-02, revised, Computer Science Dept., Technion, June 2005.
[2] S. Baker and S.K. Nayar, “Pattern Rejection,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 544-549, June 1996.
[3] P.J. Burt, T.H. Hong, and A. Rosenfeld, “Segmentation and Estimation of Image Region Properties through Cooperative Hierarchical Computation,” SMC, vol. 11, no. 12, pp. 802-809, Dec. 1981.
[4] T.C. Callaghan, “Interference and Domination in Texture Segregation,” Visual Search, Proc. First Int'l Conf. Visual Search, pp. 81-87, 1988.
[5] C. Carson, S. Belongie, H. Greenspan, and J. Malik, “Blobworld: Image Segmentation Using Expectation-Maximization and Its Applications to Image Quering,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 8, pp. 1026-1038, Aug. 2002.
[6] I.J. Cox, M.L. Miller, T.P. Minka, T.V. Papathomas, and P.N. Yianilos, “The Bayesian Image Retrieval System, Pichunter,” IEEE Trans. Image Processing, vol. 9, no. 1, pp. 20-37, Jan. 2000.
[7] S.J. Dickinson, H.I. Christensen, J.K. Tsotsos, and G. Olofsson, “Active Object Recognition Integrating Attention and Viewpoint Control,” Computer Vision and Image Understanding, vol. 67, no. 3, pp. 239-260, Sept. 1997.
[8] J. Duncan and G.W. Humphreys, “Visual Search and Stimulus Similarity,” Psychological Rev., vol. 96, pp. 433-458, 1989.
[9] L. Fei-Fei, R. Fergus, and P. Perona, “A Bayesian Approach to Unsupervised One-Shot Learning of Object Categories,” Proc. Int'l Conf. Computer Vision, pp. 1134-1141, 2003.
[10] T.F. Gonzalez, “Clustering to Minimize the Maximum Intercluster Distance,” Theoretical Computer Science, vol. 38, nos. 2-3, pp. 293-306, June 1985.
[11] G.W. Humphreys and H.J. Muller, “Search via Recursive Rejection (SERR): A Connectionist Model of Visual Search,” Cognitive Psychology, vol. 25, pp. 43-110, 1993.
[12] L. Itti, C. Koch, and E. Niebur, “A Model of Saliency-Based Visual Attention for Rapid Scene Analysis,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, no. 11, pp. 1254-1259, Nov. 1998.
[13] B. Julesz, “A Brief Outline of the Texon Theory of Human Vision,” Trends in Neuroscience, vol. 7. no. 2, pp. 41-45, 1984.
[14] C. Koch and S. Ullman, “Shifts in Selective Visual Attention: Towards the Underlying Neural Vircuity,” Human Neurobiology, vol. 4, pp. 219-227, 1985.
[15] A.N. Kolmogorov and V.M. Tikhomirov, “$\epsilon$ -Entropy and $\epsilon$ -Capacity of Sets in Functional Spaces,” AMS Translations, Series 2, vol. 17, pp. 277-364, 1961.
[16] R. Lienhart and J. Maydt, “An Extended Set of Haar-Like Features for Rapid Object Detection,” Proc. IEEE Int'l Conf. Image Processing, vol. 1, pp. 900-903, Dec. 2002.
[17] D.G. Lowe, “Distinctive Image Features from Scale-Invariant Keypoints,” Int'l J. Computer Vision, vol. 60, no. 2, pp. 91-110, 2004.
[18] D. Martin, C. Fowlkes, D. Tal, and J. Malik, “A Database of Human Segmented Natural Images and Its Application to Evaluating Segmentation Algorithms and Measuring Ecological Statistics,” Proc. Eighth Int'l Conf. Computer Vision, vol. 2, pp. 416-423, July 2001.
[19] S. Minut and S. Mahadevan, “A Reinforcement Learning Model of Selective Visual Attention,” Proc. Fifth Int'l Conf. Autonomous Agents, pp. 457-464, 2001.
[20] K. Nakayama and G.H. Silverman, “Serial and Parallel Processing of Visual Feature Conjunction,” Nature, vol. 320, pp. 264-265, 1986.
[21] U. Neisser, Cognitive Psychology. New York: Appleton-Century-Crofts, 1967.
[22] S. Nene, S. Nayar, and H. Murase, “Columbia Object Image Library (COIL-100),” Technical Report CUCS-006-96, Dept. of Computer Science, Columbia Univ., Feb. 1996.
[23] A. Papoulis and S.U. Pillai, Probability, Random Variables, and Stochastic Processes, fourth ed., New York: McGraw-Hill, 2002.
[24] M.I. Posner, C.R.R. Snyder, and B.J. Davidson, “Attention and the Detection of Signals,” J. Experimental Psychology: General, vol. 109, no. 2, pp. 160-174, June 1980.
[25] R.P.N. Rao and D.H. Ballard, “An Active Vision Architecture Based on Iconic Representations,” Artificial Intelligence, vol. 78, nos. 1-2, pp. 461-505, 1995.
[26] R.D. Rimey and C.M. Brown, “Control of Selective Perception Using Bayes Nets and Decision Theory,” Int'l J. Computer Vision, vol. 12, pp. 173-207, 1994.
[27] H. Rowley, S. Baluja, and T. Kanade, “Rotation Invariant Neural Network-Based Face Detection,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, no. 1, Jan. 1998.
[28] B.J Scholl, “Objects and Attention: The State-of-the-Art,” Cognition, vol. 80, pp. 1-46, 2001.
[29] K. Sung and T. Poggio, “Example-Based Learning for View-Based Face Detection,” IEEE Pattern Analysis and Machine Intelligence, vol. 20, pp. 39-51, 1998.
[30] M.J. Swain and D.H. Ballard, “Color Indexing,” Int'l J. Computer Vision, vol. 7, pp. 11-32, 1991.
[31] M.J. Swain and M.A. Stricker, “Promising Directions in Active Vision,” Int'l J. Computer Vision, vol. 11, no. 2, pp. 109-126, 1993.
[32] H. Tagare, K. Toyama, and J.G. Wang, “A Maximum-Likelihood Strategy for Directing Attention During Visual Search,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23, no. 5, pp. 490-500, May 2001.
[33] A. Treisman and G. Gelade, “A Feature Integration Theory of Attention,” Cognitive Psychology, vol. 12, pp. 97-136, 1980.
[34] A. Treisman and S. Gormican, “Feature Analysis in Early Vision: Evidence from Search Asymetries,” Psychological Rev., vol. 95, no. 1, pp. 15-48, 1988.
[35] J.K. Tsotsos, “On the Relative Complexity of Active versus Passive Visual Search,” Int'l J. Computer Vision, vol. 7, no. 2, pp. 127-141, 1992.
[36] J.K. Tsotsos, S.M. Culhane, W.Y.K. Wai, Y. Lai, N. Davis, and F.J. Nuflo, “Modeling Visual Attention via Selective Tuning,” Artificial intelligence, vol. 78, nos. 1-2, pp. 507-545, 1995.
[37] P. Viola and M.J. Jones, “Robust Real-Time Face Detection,” Int'l J. Computer Vision, vol. 57, no. 2, pp. 137-154, May 2004.
[38] D.L. Wang, “Object Selection Based on Oscillatory Correlation,” Neural Networks, vol. 12, nos. 4-5, pp. 579-592, 1999.
[39] L.E. Wixson and D.H. Ballard, “Using Intermediate Objects to Improve the Efficiency of Visual-Search,” Int'l J. Computer Vision, vol. 12, nos. 2-3, pp. 209-230, Apr. 1994.
[40] J.M. Wolfe, “Guided Search 2. 0: A Revised Model of Visual Search,” Psychonomic Bull. and Rev., vol. 1, no. 2, pp. 202-238, 1994.
[41] Y. Rubner, C. Tomasi, and L.J. Guibas, “The Earth Movers Distance as a Metric for Image Retrieval,” Int'l J. Computer Vision, vol. 40, no. 2, pp. 99-121, 2000.
[42] A.L. Yarbus, Eye Movements and Vision. New York: Plenum Press, 1967.

Index Terms:
Index Terms- Computer vision, scene analysis, feature representation, similarity measures, performance evaluation of algorithms and systems, object recognition, visual search, attention.
Tamar Avraham, Michael Lindenbaum, "Attention-Based Dynamic Visual Search Using Inner-Scene Similarity: Algorithms and Bounds," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 2, pp. 251-264, Feb. 2006, doi:10.1109/TPAMI.2006.28
Usage of this product signifies your acceptance of the Terms of Use.