The Community for Technology Leaders
RSS Icon
Issue No.06 - June (2010 vol.22)
pp: 854-867
Pradipta Maji , Indian Statistical Institute, Kolkata
Sankar K. Pal , Indian Statistical Institute, Kolkata
The selection of nonredundant and relevant features of real-valued data sets is a highly challenging problem. A novel feature selection method is presented here based on fuzzy-rough sets by maximizing the relevance and minimizing the redundancy of the selected features. By introducing the fuzzy equivalence partition matrix, a novel representation of Shannon's entropy for fuzzy approximation spaces is proposed to measure the relevance and redundancy of features suitable for real-valued data sets. The fuzzy equivalence partition matrix also offers an efficient way to calculate many more information measures, termed as f-information measures. Several f-information measures are shown to be effective for selecting nonredundant and relevant features of real-valued data sets. This paper compares the performance of different f-information measures for feature selection in fuzzy approximation spaces. Some quantitative indexes are introduced based on fuzzy-rough sets for evaluating the performance of proposed method. The effectiveness of the proposed method, along with a comparison with other methods, is demonstrated on a set of real-life data sets.
Pattern recognition, data mining, feature selection, fuzzy-rough sets, f-information measures.
Pradipta Maji, Sankar K. Pal, "Feature Selection Using f-Information Measures in Fuzzy Approximation Spaces", IEEE Transactions on Knowledge & Data Engineering, vol.22, no. 6, pp. 854-867, June 2010, doi:10.1109/TKDE.2009.124
[1] R.O. Duda, P.E. Hart, and D.G. Stork, Pattern Classification and Scene Analysis. John Wiley & Sons, 1999.
[2] P.A. Devijver and J. Kittler, Pattern Recognition: A Statistical Approach. Prentice Hall, 1982.
[3] M. Dash and H. Liu, "Consistency Based Search in Feature Selection," Artificial Intelligence, vol. 151, nos. 1/2, pp. 155-176, 2003.
[4] R. Battiti, "Using Mutual Information for Selecting Features in Supervised Neural Net Learning," IEEE Trans. Neural Network, vol. 5, no. 4, pp. 537-550, 1994.
[5] J.R. Quinlan, C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.
[6] I. Vajda, Theory of Statistical Inference and Information. Kluwer Academic, 1989.
[7] J.P.W. Pluim, J.B.A. Maintz, and M.A. Viergever, "$f$ -Information Measures in Medical Image Registration," IEEE Trans. Medical Imaging, vol. 23, no. 12, pp. 1508-1516, Dec. 2004.
[8] P. Maji, "$f$ -Information Measures for Efficient Selection of Discriminative Genes from Microarray Data," IEEE Trans. Biomedical Eng., vol. 56, no. 4, pp. 1-7, Apr. 2009.
[9] Z. Pawlak, Rough Sets, Theoretical Aspects of Reasoning About Data. Kluwer, 1991.
[10] R. Jensen and Q. Shen, "Semantics-Preserving Dimensionality Reduction: Rough and Fuzzy-Rough-Based Approach," IEEE Trans. Knowledge and Data Eng., vol. 16, no. 12, pp. 1457-1471, Dec. 2004.
[11] P. Maji and S.K. Pal, "Rough-Fuzzy C-Medoids Algorithm and Selection of Bio-Basis for Amino Acid Sequence Analysis," IEEE Trans. Knowledge and Data Eng., vol. 19, no. 6, pp. 859-872, June 2007.
[12] D. Dubois and H. Prade, "Rough Fuzzy Sets and Fuzzy Rough Sets," Int'l J. General Systems, vol. 17, pp. 191-209, 1990.
[13] P. Maji and S.K. Pal, "Rough Set Based Generalized Fuzzy C-Means Algorithm and Quantitative Indices," IEEE Trans. System, Man and Cybernetics, Part B: Cybernetics, vol. 37, no. 6, pp. 1529-1540, Dec. 2007.
[14] Q. Hu, D. Yu, Z. Xie, and J. Liu, "Fuzzy Probabilistic Approximation Spaces and Their Information Measures," IEEE Trans. Fuzzy Systems, vol. 14, no. 2, pp. 191-201, 2007.
[15] C. Shannon and W. Weaver, The Mathematical Theory of Communication. Univ. of Illinois Press, 1964.
[16] S.K. Pal and S. Mitra, Neuro-Fuzzy Pattern Recognition: Methods in Soft Computing. Wiley, 1999.
[17] M. Dash and H. Liu, "Unsupervised Feature Selection," Proc. Pacific Asia Conf. Knowledge Discovery and Data Mining, pp. 110-121, 2000.
[18] A. Chouchoulas and Q. Shen, "Rough Set-Aided Keyword Reduction for Text Categorisation," Applied Artificial Intelligence, vol. 15, no. 9, pp. 843-873, 2001.
19 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool