This Article 
 Bibliographic References 
 Add to: 
Feature Extraction Based on ICA for Binary Classification Problems
November/December 2003 (vol. 15 no. 6)
pp. 1374-1388

Abstract—In manipulating data such as in supervised learning, we often extract new features from the original features for the purpose of reducing the dimensions of feature space and achieving better performance. In this paper, we show how standard algorithms for independent component analysis (ICA) can be appended with binary class labels to produce a number of features that do not carry information about the class labels—these features will be discarded—and a number of features that do. We also provide a local stability analysis of the proposed algorithm. The advantage is that general ICA algorithms become available to a task of feature extraction for classification problems by maximizing the joint mutual information between class labels and new features, although only for two-class problems. Using the new features, we can greatly reduce the dimension of feature space without degrading the performance of classifying systems.

[1] V.S. Cherkassky and I.F. Mulier, Learning from Data, chapter 5. John Wiley&Sons, 1998.
[2] G.H. John, Enhancements to the Data Mining Process PhD thesis, Computer Science Dept., Stanford Univ., 1997.
[3] I.T. Joliffe, Principal Component Analysis. Springer-Verlag, 1986.
[4] K. Fukunaga, Introduction to Statistical Pattern Recognition, second edition. Academic Press, 1990.
[5] H. Lu, R. Setiono, and H. Liu, Effective Data Mining Using Neural Networks IEEE Trans. Knowledge and Data Eng., vol. 8, no. 6, Dec. 1996.
[6] J.M. Steppe, K.W. BauerJr., and S.K. Rogers, Integrated Feature and Architecture Selection IEEE Trans. Neural Networks, vol. 7, no. 4, July 1996.
[7] K.J. McGarry, S. Wermter, and J. MacIntyre, Knowledge Extraction from Radial Basis Function Networks and Multi-Layer Perceptrons Proc. Int'l Joint Conf. Neural Networks, July 1999.
[8] R. Setiono and H. Liu, A Connectionist Approach to Generating Oblique Decision Trees IEEE Trans. Systems, Man, and Cybernetics Part B: Cybernetics, vol. 29, no. 3, June 1999.
[9] Q. Li and D.W. Tufts, Principal Feature Classification IEEE Trans. Neural Networks, vol. 8, no. 1, Jan. 1997.
[10] M. Baldoni, C. Baroglio, D. Cavagnino, and L. Saitta, Towards Automatic Fractal Feature Extraction for Image Recognition, pp. 357-373. Kluwer Academic Publishers, 1998.
[11] Y. Mallet, D. Coomans, J. Kautsky, and O. De Vel, "Classification Using Adaptive Wavelets for Feature Extraction," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 19, pp. 1,058-1,066, Oct. 1997.
[12] A.J. Bell and T.J. Sejnowski, An Information-Maximization Approach to Blind Separation and Blind Deconvolution Neural Computation, vol. 7, no. 6, June 1995.
[13] A. Hyvarinen, E. Oja, P. Hoyer, and J. Hurri, Image Feature Extraction by Sparse Coding and Independent Component Analysis Proc. 14th Int'l Conf. Pattern Recognition, Aug. 1998.
[14] M. Kotani et al, Application of Independent Component Analysis to Feature Extraction of Speech Proc. Int'l Joint Conf. Neural Networks, July 1999.
[15] A.D. Back and T.P. Trappenberg, Input Variable Selection Using Independent Component Analysis Proc. Int'l Joint Conf. Neural Networks, July 1999.
[16] H.H. Yang and J. Moody, Data Visualization and Feature Selection: New Algorithms for Nongaussian Data Advances in Neural Information Processing Systems, vol. 12, 2000.
[17] J.W. FisherIII and J.C. Principe, A Methodology for Information Theoretic Feature Extraction Proc. Int'l Joint Conf. Neural Networks, May 1998.
[18] K. Torkkola and W.M. Campbell, Mutual Information in Learning Feature Transformations Proc. Int'l Conf. Machine Learning, 2000.
[19] N. Kwak, C.-H. Choi, and C.-Y. Choi, Feature Extraction Using Ica Proc. Int'l Conf. Artificial Neural Networks, Aug. 2001.
[20] J. Herault and C. Jutten, Space or Time Adaptive Signal Provessing by Neural Network Models Proc. AIP Conf. Neural Networks Computing, vol. 151, pp. 206-211, 1986.
[21] J. Cardoso, Source Separation Using Higher Order Moments Proc. IEEE Int'l Conf. Acoustics, Speech, and Signal Processing, pp. 2109-2112, 1989.
[22] P. Comon, “Independent Component Analysis, a New Concept?” Signal Processing, vol. 36, no. 3, 1994.
[23] D. Obradovic and G. Deco, Blind Source Seperation: Are Information Maximization and Redundancy Minimization Different? Proc. IEEE Workshop Neural Networks for Signal Processing, Sept. 1997.
[24] J. Cardoso, Infomax and Maximum Likelifood for Blind Source Separation IEEE Signal Processing Letters, vol. 4, no. 4, Apr. 1997.
[25] T.-W. Lee, M. Girolami, A.J. Bell, and T.J. Sejnowski, A Unifying Information Theoretic Framework for Independent Component Analysis Computers and Math. with Applications, vol. 31, no. 11, Mar. 2000.
[26] T.-W. Lee, M. Girolami, and T.J. Sejnowski, Independent Component Analysis Using an Extended Infomax Algorithm for Mixed Sub-Gaussian and Super-Gaussian Sources Neural Computation, vol. 11, no. 2, Feb. 1999.
[27] M. Girolami, An Alternative Perspective on Adaptive Independent Component Analysis Algorithms Neural Computation, vol. 10, no. 8, pp. 2103-2114, 1998.
[28] L. Xu, C. Cheung, and S.-I. Amari, Learned Parametric Mixture Based Ica Algorithm Neurocomputing, vol. 22, nos. 1-3, pp. 69-80, 1998.
[29] T.M. Cover and J.A. Thomas, Elements of Information Theory. John Wiley&Sons, 1991.
[30] J. Cardoso, On the Stability of Source Separation Algorithms J. VLSI Signal Processing Systems, vol. 26, no. 1, pp. 7-14, Aug. 2000.
[31] N. Vlassis and Y. Motomura, Efficient Source Adaptivity in Independent Component Analysis IEEE Trans. Neural Networks, vol. 12, no. 3, May 2001.
[32] N. Kwak and C.-H. Choi, Improved Mutual Information Feature Selector for Neural Networks in Supervised Learning Proc. Int'l Joint Conf. Neural Networks, July 1999.
[33] N. Kwak and C.-H. Choi, Input Feature Selection for Classification Problems IEEE Trans. Neural Networks, vol. 13, no. 1, Jan. 2002.
[34] R. Agrawal, T. Imielinski, and A. Swami, Database Mining: A Performance Perspective IEEE Trans. Knowledge and Data Eng., vol. 5, no. 6, Dec. 1993.
[35] Quest Group at IBM Almaden Research Center, Quest Synthetic Data Generation Code for Classification http://www.almaden., 1993.
[36] J.R. Quinlan, C4.5: Programs for Machine Learning,San Mateo, Calif.: Morgan Kaufman, 1992.
[37] P.M. Murphy and D.W. Aha, Uci Repository of Machine Learning Databases, 1994.
[38] S. Ruping, mysvm-A Support Vector Machine MYSVM/, 2003.
[39] R.P. Gorman and T.J. Sejnowski, Analysis of Hidden Units in a Layered Network Trained to Classify Sonar Targets Neural Networks, vol. 1, pp. 75-89, 1988.
[40] E. Parzen, On Estimation of a Probability Density Function and Mode Ann. Math. Statistics, vol. 33, pp. 1065-1076, Sept. 1962.
[41] W.H. Wolberg and O.L. Mangasarian, Multisurface Method of Pattern Separation for Medical Diagnosis Applied to Breast Cytology Proc. Nat'l Academy of Sciences, vol. 87, Dec. 1990.

Index Terms:
Feature extraction, ICA, stability, classification.
Nojun Kwak, Chong-Ho Choi, "Feature Extraction Based on ICA for Binary Classification Problems," IEEE Transactions on Knowledge and Data Engineering, vol. 15, no. 6, pp. 1374-1388, Nov.-Dec. 2003, doi:10.1109/TKDE.2003.1245279
Usage of this product signifies your acceptance of the Terms of Use.