Issue No. 06 - June (2010 vol. 22)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.124
Sankar K. Pal , Indian Statistical Institute, Kolkata
Pradipta Maji , Indian Statistical Institute, Kolkata
The selection of nonredundant and relevant features of real-valued data sets is a highly challenging problem. A novel feature selection method is presented here based on fuzzy-rough sets by maximizing the relevance and minimizing the redundancy of the selected features. By introducing the fuzzy equivalence partition matrix, a novel representation of Shannon's entropy for fuzzy approximation spaces is proposed to measure the relevance and redundancy of features suitable for real-valued data sets. The fuzzy equivalence partition matrix also offers an efficient way to calculate many more information measures, termed as f-information measures. Several f-information measures are shown to be effective for selecting nonredundant and relevant features of real-valued data sets. This paper compares the performance of different f-information measures for feature selection in fuzzy approximation spaces. Some quantitative indexes are introduced based on fuzzy-rough sets for evaluating the performance of proposed method. The effectiveness of the proposed method, along with a comparison with other methods, is demonstrated on a set of real-life data sets.
Pattern recognition, data mining, feature selection, fuzzy-rough sets, f-information measures.
Sankar K. Pal, Pradipta Maji, "Feature Selection Using f-Information Measures in Fuzzy Approximation Spaces", IEEE Transactions on Knowledge & Data Engineering, vol. 22, no. , pp. 854-867, June 2010, doi:10.1109/TKDE.2009.124