This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Input Variable Selection: Mutual Information and Linear Mixing Measures
January 2006 (vol. 18 no. 1)
pp. 37-46
Determining the most appropriate inputs to a model has a significant impact on the performance of the model and associated algorithms for classification, prediction, and data analysis. Previously, we proposed an algorithm ICAIVS which utilizes independent component analysis (ICA) as a preprocessing stage to overcome issues of dependencies between inputs, before the data being passed through to an inout variable selection (IVS) stage. While we demonstrated previously with artificial data that ICA can prevent an overestimation of necessary input variables, we show here that mixing between input variables is common in real-world data sets so that ICA preprocessing is useful in practice. This experimental test is based on new measures introduced in this paper. Furthermore, we extend the implementation of our variable selection scheme to a statistical dependency test based on mutual information and test several algorithms on Gaussian and sub-Gaussian signals. Specifically, we propose a novel method of quantifying linear dependencies using ICA estimates of mixing matrices with a new Linear Mixing Measure (LMM).

[1] R. Kohavi and G.H. John, “Wrappers for Feature Subset Selection,” Artificial Intelligence, vol. 97, nos. 1-2, pp. 273-324, 1997.
[2] I. Guyon and A. Elisseeff, “An Introduction to Variable and Feature Selection,” J. Machine Learning Research, vol. 3, pp. 1157-1182, Mar. 2003.
[3] G.H. John, R. Kohavi, and K. Pfleger, “Irrelevant Features and the Subset Selection Problem,” Proc. Int'l Conf. Machine Learning, pp. 121-129, 1994, journal version in AIJ, available at http://citeseer.nj.nec.com13663.html.
[4] A.D. Back and T.P. Trappenberg, “Selecting Inputs for Modeling Using Normalized Higher Order Statistics and Independent Component Analysis,” IEEE Trans. Neural Networks, vol. 12, no. 3, pp. 612-617, May 2001.
[5] L.A. Rendell and R. Seshu, “Learning Hard Concepts through Constructive Induction: Framework and Rationale,” Computational Intelligence, vol. 6, no. 4, pp. 247-270, 1990.
[6] A.A. Freitas, “Understanding the Crucial Role of Attribute Interaction in Data Mining,” Artificial Intelligence Rev., vol. 16, no. 3, pp. 177-199, Nov. 2001.
[7] S. Letourneau, “Identification of Attribute Interactions and Generation of Globally Relevant Continuous Features in Machine Learning,” PhD thesis, School of Information Technology and Eng., Univ. of Ottawa, Ottawa, Ontario, Canada, Aug. 2003.
[8] B.V. Bonnlander and A.S. Weigend, “Selecting Input Variables Using Mutual Information and Nonparametric Density Estimation,” Proc. 1994 Int'l Symp. Artificial Neural Networks (ISANN '94), pp. 42-50, 1994.
[9] H.H. Yang and J.E. Moody, “Data Visualization and Feature Selection: New Algorithms for Nongaussian Data,” Advances in Neural Information Processing Systems, T.K. Leen, S.A. Solla, and K.-R. Muller, eds., vol. 12, MIT Press, 2000.
[10] G.A. Darbellay and I. Vajda, “Estimation of the Information by an Adaptive Partitioning of the Observation Space,” IEEE Trans. Information Theory, vol. 45, no. 4, pp. 1315-1321, May 1999.
[11] S. Blinnikiov and R. Moessner, “ Expansions for Nearly Gaussian Distributions,” Astronomy and Astrophysics, Supplement Series, vol. 130, pp. 193-205, 1998.
[12] S. Amari, A. Cichocki, and H.H. Yang, “A New Learning Algorithm for Blind Signal Separation,” Advances in Neural Information Processing Systems, D.S. Touretzky, M.C. Mozer, and M.E. Hasselmo, eds., vol. 8, pp. 757-763, The MIT Press, 1996.
[13] S. Akaho, Y. Kiuchi, and S. Umeyama, “Mica: Multimodal Independent Component Analysis,” Proc. Int'l Joint Conf. Neural Networks, 1999.
[14] S. Amari and H. Nagaoka, Methods of Information Geometry. AMS and Oxford Univ. Press, 2000.
[15] E.A. Robinson, Probability Theory and Applications. Int'l Human Resources Development Corp., 1985.
[16] J. Ouyang, “Improved Icaivs Algorithm with Mutual Information,” master's thesis, Dalhousie Univ., 2004.
[17] S. Chib and E. Greenberg, “Understanding the Metropolis-Hastings Algorithm,” The Am. Statistician, vol. 49, no. 4, pp. 327-335, Nov. 1995.
[18] Statlib— data sets archive, http://lib.stat.cmu.edudatasets/, 2005.
[19] C.E. Rasmussen, R.M. Neal, G. Hinton, D. van Camp, M. Revow, Z. Ghahramani, R. Kustra, and R. Tibshirani, “Data for Evaluating Learning in Valid Experiments,” http://www.cs.toronto.edu.~delve, 2005.
[20] C.L. Blake and C.J. Merz UCI Repository of Machine Learning Databases, http://www.ics.uci.edu/~mlearnmlrepository.html , 1998.
[21] S.G. Makridakis, S.C. Wheelwright, and R.J. Hyndman, Forecasting: Methods and Applications, third ed. John Wiley & Sons, 1998.
[22] T.W.S. Chow and D. Huang, “Estimating Optimal Feature Subsets Using Efficient Estimation of High-Dimensional Mutual Information,” IEEE Trans. Neural Networks, vol. 16, no. 1, pp. 213-224, 2005.

Index Terms:
Index Terms- Input variable selection, modeling, data preprocessing, independent component analysis, mutual information estimation.
Citation:
Thomas Trappenberg, Jie Ouyang, Andrew Back, "Input Variable Selection: Mutual Information and Linear Mixing Measures," IEEE Transactions on Knowledge and Data Engineering, vol. 18, no. 1, pp. 37-46, Jan. 2006, doi:10.1109/TKDE.2006.11
Usage of this product signifies your acceptance of the Terms of Use.