CSDL Home IEEE Transactions on Pattern Analysis & Machine Intelligence 2009 vol.31 Issue No.02 - February

Subscribe

Issue No.02 - February (2009 vol.31)

pp: 228-244

Manuela Vasconcelos , UCSD, La Jolla

Nuno Vasconcelos , UCSD, La Jolla

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TPAMI.2008.77

ABSTRACT

Low-complexity feature selection is analyzed in the context of visual recognition. It is hypothesized that high-order dependences of bandpass features contain little information for discrimination of natural images. This hypothesis is characterized formally by the introduction of the concepts of conjunctive interference and decomposability order of a feature set. Necessary and sufficient conditions for the feasibility of low-complexity feature selection are then derived in terms of these concepts. It is shown that the intrinsic complexity of feature selection is determined by the decomposability order of the feature set and not its dimension. Feature selection algorithms are then derived for all levels of complexity and are shown to be approximated by existing information-theoretic methods, which they consistently outperform. The new algorithms are also used to objectively test the hypothesis of low decomposability order through comparison of classification performance. It is shown that, for image classification, the gain of modeling feature dependencies has strongly diminishing returns: best results are obtained under the assumption of decomposability order 1. This suggests a generic law for bandpass features extracted from natural images: that the effect, on the dependence of any two features, of observing any other feature is constant across image classes.

INDEX TERMS

Feature extraction and construction, low complexity, natural image statistics, information theory, feature discrimination versus dependence, image databases, object recognition, texture, perceptual reasoning.

CITATION

Manuela Vasconcelos, Nuno Vasconcelos, "Natural Image Statistics and Low-Complexity Feature Selection",

*IEEE Transactions on Pattern Analysis & Machine Intelligence*, vol.31, no. 2, pp. 228-244, February 2009, doi:10.1109/TPAMI.2008.77REFERENCES

- [1] R. Clarke,
Transform Coding of Images. Academic Press, 1985.- [2] S. Mallat, “A Theory for Multiresolution Signal Decomposition: The Wavelet Representation,”
IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 11, no. 7, pp. 674-693, July 1989.- [3] D. Ruderman, “The Statistics of Natural Images,”
Network: Computation in Neural Systems, vol. 5, no. 4, pp. 517-548, 1994.- [4] D. Field, “Relations between the Statistics of Natural Images and the Response Properties of Cortical Cells,”
J. Optical Soc. Am. A, vol. 4, no. 12, pp. 2379-2394, 1987.- [5] D. Field, “What Is the Goal of Sensory Coding,”
Neural Computation, vol. 6, no. 4, pp. 559-601, Jan. 1989.- [6] R. Bucccigrossi and E. Simoncelli, “Image Compression via Joint Statistical Characterization in the Wavelet Domain,”
IEEE Trans. Image Processing, vol. 8, pp. 1688-1701, 1999.- [7] J. Huang and D. Mumford, “Statistics of Natural Images and Models,”
Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 1999.- [8] E. Simoncelli and B. Olshausen, “Natural Image Statistics and Neural Representation,”
Ann. Rev. of Neuroscience, vol. 24, pp. 1193-1216, 2001.- [9] A. Torralba and A. Oliva, “Depth Estimation from Image Structure,”
IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 9, pp. 1226-1238, Sept. 2002.- [10] A. Srivastava, A. Lee, E. Simoncelli, and S. Zhu, “On Advances in Statistical Modeling of Natural Images,”
J. Math. Imaging and Vision, vol. 18, pp. 17-33, 2003.- [11] F. Long and D. Purves, “Natural Scene Statistics as the Universal Basis of Color Context Effects,”
Proc. Nat'l Academy of Sciences USA, vol. 100, no. 25, pp. 15190-15193, 2003.- [12] B. Olshausen and D. Field, “Emergence of Simple-Cell Receptive Field Properties by Learning a Sparse Code for Natural Images,”
Nature, vol. 381, pp. 607-609, 1996.- [13] A. Bell and T. Sejnowski, “The Independent Components of Natural Scenes Are Edge Filters,”
Vision Research, vol. 37, no. 23, pp. 3327-3328, Dec. 1997.- [14] J.H. van Hateren and D.L. Ruderman, “Independent Component Analysis of Natural Image Sequences Yields Spatiotemporal Filters Similar to Simple Cells in Primary Visual Cortex,”
Proc. Royal Soc. B, vol. 265, pp. 2315-2320, 1998.- [15] J. Portilla, V. Strela, M. Wainwright, and E. Simoncelli, “Image Denoising Using Scale Mixtures of Gaussians in the Wavelet Domain,”
IEEE Trans. Image Processing, vol. 12, no. 11, pp.1338-1351, Nov. 2003.- [16] P. Moulin and L. Juan, “Analysis of Multiresolution Image Denoising Schemes Using Generalized Gaussian and Complexity Priors,”
IEEE Trans. Information Theory, vol. 45, pp. 909-919, Apr. 1999.- [17] A. Levin, A. Zomet, and Y. Weiss, “Learning How to Inpaint from Global Image Statistics,”
Proc. Ninth IEEE Int'l Conf. Computer Vision (ICCV '03), pp. 305-312, 2003.- [18] S. Roth and M. Black, “Fields of Experts: A Framework for Learning Image Priors,”
Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR '05), vol. 2, pp. 860-867, 2005.- [19] N. Farvardin and J. Modestino, “Optimum Quantizer Performance for a Class of Non-Gaussian Memoryless Sources,”
IEEE Trans. Information Theory, May 1984.- [20] Y. Weiss, “Deriving Intrinsic Images from Image Sequences,”
Proc. Seventh IEEE Int'l Conf. Computer Vision (ICCV '01), vol. 2, pp. 68-75, 2001.- [21] M. Do and M. Vetterli, “Wavelet-Based Texture Retrieval Using Generalized Gaussian Density and Kullback-Leibler Distance,”
IEEE Trans. Image Processing, vol. 11, no. 2, pp. 146-158, Feb. 2002.- [22] S. Chang, B. Yu, and M. Vetterli, “Adaptive Wavelet Thresholding for Image Denoising and Compression,”
IEEE Trans. Image Processing, vol. 9, no. 9, pp. 1532-1546, Sept. 2000.- [23] M. Heiler and C. Schnorr, “Natural Image Statistics for Natural Image Segmentation,”
Int'l J. Computer Vision, vol. 63, no. 1, pp. 5-19, 2005.- [24] F. Attneave, “Informational Aspects of Visual Perception,”
Psychological Rev., vol. 61, pp. 183-193, 1954.- [25] H. Barlow, “The Coding of Sensory Messages,”
Current Problems in Animal Behaviour, W. Thorpe and O. Zangwill, eds., pp. 331-360, Cambridge Univ. Press, 1961.- [26] H. Barlow, “Redundancy Reduction Revisited,”
Network: Computation in Neural Systems, vol. 12, pp. 241-253, 2001.- [27] B. Olshausen and D. Field, “Sparse Coding with an Overcomplete Basis Set: A Strategy Employed by V1,”
Vision Research, vol. 37, pp. 3311-3325, 1997.- [28] O. Schwartz and E. Simoncelli, “Natural Signal Statistics and Sensory Gain Control,”
Nature Neuroscience, vol. 4, pp. 819-825, 2001.- [29] S. Deneve, P. Latham, and A. Pouget, “Reading Population Codes: A Neural Implementation of Ideal Observers,”
Nature Neuroscience, vol. 2, pp. 740-745, 1999.- [30] A. Pouget, P. Dayan, and R. Zemel, “Information Processing with Population Codes,”
Nature Reviews Neuroscience, vol. 1, no. 2, pp. 125-132, 2000.- [31] D. Gao and N. Vasconcelos, “Discriminant Saliency for Visual Recognition from Cluttered Scenes,”
Advances in Neural Information Processing Systems 17, L.K. Saul, Y. Weiss, and L. Bottou, eds., pp. 481-488, 2005.- [32] A. Levin, A. Zomet, and Y. Weiss, “Separating Reflections from a Single Image Using Local Features,”
Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR '04), vol. 1, 2004.- [33] R. Duda, P. Hart, and D. Stork,
Pattern Classification. John Wiley & Sons, 2001.- [34] N. Vasconcelos and G. Carneiro, “What Is the Role of Independence for Visual Recognition?”
Proc. Seventh European Conf. Computer Vision (ECCV), 2002.- [35] A. Treisman and G. Galade, “A Feature Integration Theory of Attention,”
Cognitive Psychology, vol. 12, no. 1, pp. 97-136, 1980.- [36] A. Treisman and S. Gormican, “Feature Analysis in Early Vision: Evidence from Search Asymmetries,”
Psychological Rev., vol. 95, no. 1, pp. 15-48, 1988.- [37] A. Treisman and S. Sato, “Conjunction Search Revisited,”
J.Experimental Perception and Performance, vol. 16, pp. 459-478, 1990.- [38] K. Cave and J. Wolfe, “Modeling the Role of Parallel Processing in Visual Search,”
Cognitive Psychology, vol. 22, no. 5, pp. 225-271, 1990.- [39] J. Wolfe, “Guided Search 2.0: A Revised Model of Visual Search,”
Psychonomic Bull. and Rev., vol. 1, no. 2, pp. 202-238, 1994.- [40] J. Wolfe and T. Horowitz, “What Attributes Guide the Deployment of Visual Attention and How Do They Do It?”
Nature Reviews Neuroscience, vol. 5, pp. 495-501, 2004.- [41] D. Lewis, “Feature Selection and Feature Extraction for Text Categorization,”
Proc. Workshop Speech and Natural Language, pp.212-217, 1992.- [42] S. Dumais and H. Chen, “Hierarchical Classification of Web Content,”
Proc. ACM SIGIR '00, pp. 256-263, 2000.- [43] S. Dumais, “Using SVMs for Text Categorization,”
IEEE Intelligent Systems, vol. 13, no. 4, pp. 21-23, 1998.- [44] G.W.F. Lochovsky and Q. Yang, “Feature Selection with Conditional Mutual Information Maximin in Text Categorization,”
Proc. 13th ACM Conf. Information and Knowledge Management (CIKM '04), pp. 342-349, 2004.- [45] Y. Seo, A. Ankolekar, and K. Sycara, “Feature Selection for Extracting Semantically Rich Words,” Technical Report CMU-RI-TR-04-18, Robotics Inst., Carnegie Mellon Univ., Mar. 2004.
- [46] E. Xing, M. Jordan, and R. Karp, “Feature Selection for High-Dimensional Genomic Microarray Data,”
Proc. 18th Int'l Conf. Machine Learning (ICML '01), pp. 601-608, 2001.- [47] C. Ding and H. Peng, “Minimum Redundancy Feature Selection from Microarray Gene Expression Data,”
Proc. IEEE Bioinformatics Conf. (CSB '03), pp. 523-528, 2003.- [48] H. Peng, F. Long, and C. Ding, “Feature Selection Based on Mutual Information: Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy,”
IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 27, no. 8, pp. 1226-1238, Aug. 2005.- [49] P. Zarjam, M. Mesbah, and M. Boashash, “An Optimal Feature Set for Seizure Detection Systems for Newborn EEG Signals,”
Proc. IEEE Int'l Symp. Circuits and Systems (ISCAS '03), vol. 5, 2003.- [50] E. Grall-Maes and P. Beauseroy, “Mutual Information-Based Feature Extraction on the Time-Frequency Plane,”
IEEE Trans. Signal Processing, vol. 50, no. 4, pp. 779-790, 2002.- [51] G. Tourassi, E. Frederick, M. Markey, and C. Floyd Jr., “Application of the Mutual Information Criterion for Feature Selection in Computer-Aided Diagnosis,”
Medical Physics, vol. 28, p. 2394, 2001.- [52] T. Chow and D. Huang, “Estimating Optimal Feature Subsets Using Efficient Estimation of High-Dimensional Mutual Information,”
IEEE Trans. Neural Networks, vol. 16, no. 1, pp. 213-224, 2005.- [53] G. Barrows and J. Sciortino, “A Mutual Information Measure for Feature Selection with Application to Pulse Classification,”
Proc. IEEE Int'l Symp. Time-Frequency and Time-Scale Analysis, 1996.- [54] R. Battiti, “Using Mutual Information for Selecting Features in Supervised Neural Net Learning,”
IEEE Trans. Neural Networks, vol. 5, no. 4, pp. 537-550, July 1994.- [55] M. Kwak and C. Choi, “Input Feature Selection for Classification Problems,”
IEEE Trans. Neural Networks, vol. 13, no. 1, pp. 143-159, 2002.- [56] P. Scanlon, G. Potamianos, V. Libal, and S. Chu, “Mutual Information Based Visual Feature Selection for Lipreading,”
Proc. Int'l Conf. Spoken Language Processing, pp. 857-860, 2004.- [57] H. Yang and J. Moody, “Data Visualization and Feature Selection: New Algorithms for Non-Gaussian Data,”
Proc. Neural Information Processing Systems, 2000.- [58] F. Fleuret, “Fast Binary Feature Selection with Conditional Mutual Information,”
The J. Machine Learning Research, vol. 5, pp. 1531-1555, 2004.- [59] S. Ullman, M. Vidal-Naquet, and E. Sali, “Visual Features of Intermediate Complexity and Their Use in Classification,”
Nature Neuroscience, vol. 5, no. 7, pp. 1-6, 2002.- [60] M. Vidal-Naquet and S. Ullman, “Object Recognition with Informative Features and Linear Classification,”
Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2003.- [61] F. Jurie and B. Triggs, “Creating Efficient Codebooks for Visual Recognition,”
Proc. 11th IEEE Int'l Conf. Computer Vision (ICCV '05), vol. 1, 2005.- [62] N. Vasconcelos, “Feature Selection by Maximum Marginal Diversity,”
Neural Information Processing Systems, 2002.- [63] N. Vasconcelos, “Feature Selection by Maximum Marginal Diversity: Optimality and Implications for Visual Recognition,”
Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2003.- [64] N. Vasconcelos and M. Vasconcelos, “Scalable Discriminant Feature Selection for Image Retrieval and Recognition,”
Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2004.- [65] D. Koller and M. Sahami, “Toward Optimal Feature Selection,”
Proc. 13th Int'l Conf. Machine Learning (ICML), 1996.- [66] A. Jain and D. Zongker, “Feature Selection: Evaluation, Application, and Small Sample Performance,”
IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 19, no. 2, pp. 153-158, Feb. 1997.- [67] Y. Freund and R. Schapire, “A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting,”
J. Computer and System Sciences, vol. 55, no. 1, pp. 119-139, 1997.- [68] R. Schapire, Y. Freund, P. Bartlett, and W. Lee, “Boosting the Margin: A New Explanation for the Effectiveness of Voting Methods,”
The Annals of Statistics, vol. 26, no. 5, pp. 1651-1686, 1998.- [69] C. Ratanamahatana and D. Gunopulos, “Feature Selection for the Naive Bayesian Classifier Using Decision Trees,”
Applied Artificial Intelligence, vol. 17, no. 5, pp. 475-487, 2003.- [70] J. Pearl,
Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, 1988.- [71] H. Schneiderman and T. Kanade, “Object Detection Using the Statistics of Parts,”
Int'l J. Computer Vision, vol. 56, no. 3, pp.151-177, 2004.- [72] T. Cover and J.V. Campenhout, “On the Possible Orderings in the Measurement Selection Problem,”
IEEE Trans. Systems, Man, and Cybernetics, vol. 7, no. 9, Sept. 1977.- [73] P. Pudil, J. Novovičová, and J. Kittler, “Floating Search Methods in Feature Selection,”
Pattern Recognition Letters, vol. 15, no. 11, pp.1119-1125, 1994.- [74] S. Li, L. Zhu, Z. Zhang, A. Blake, H. Zhang, and H. Shum, “Statistical Learning of Multi-View Face Detection,”
Proc. Seventh European Conf. Computer Vision (ECCV), 2002.- [75] T. Hastie, R. Tibshirani, and J. Friedman,
The Elements of Statistical Learning : Data Mining, Inference, and Prediction. Springer, 2001.- [76] M. Turk and A. Pentland, “Eigenfaces for Recognition,”
J.Cognitive Neuroscience, vol. 3, 1991.- [77] P. Belhumeur, J. Hespanha, and D. Kriegman, “Eigenfaces versus Fisherfaces: Recognition Using Class Specific Linear Projection,”
IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 19, no. 7, pp. 711-720, July 1997.- [78]
Comparison of Infomax and Maximum Variance Features, http://www.svcl.ucsd.edu/projects/infomax examples.htm, 2008.- [79] T. Cover and J. Thomas,
Elements of Information Theory. John Wiley, 1991.- [80] M. Vasconcelos and N. Vasconcelos, “Some Relationships between Minimum Bayes Error and Information Theoretical Feature Extraction,”
Proc. SPIE, vol. 5807, p. 284, 2005. |