Evaluating Stability and Comparing Output of Feature Selectors that Optimize Feature Subset Cardinality
Issue No. 11 - November (2010 vol. 32)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TPAMI.2010.34
Petr Somol , Institute of Information Theory and Automation of the Czech Academy of Sciences, Prague
Jana Novovičová , Institute of Information Theory and Automation of the Czech Academy of Sciences, Prague
Stability (robustness) of feature selection methods is a topic of recent interest, yet often neglected importance, with direct impact on the reliability of machine learning systems. We investigate the problem of evaluating the stability of feature selection processes yielding subsets of varying size. We introduce several novel feature selection stability measures and adjust some existing measures in a unifying framework that offers broad insight into the stability problem. We study in detail the properties of considered measures and demonstrate on various examples what information about the feature selection process can be gained. We also introduce an alternative approach to feature selection evaluation in the form of measures that enable comparing the similarity of two feature selection processes. These measures enable comparing, e.g., the output of two feature selection methods or two runs of one method with different parameters. The information obtained using the considered stability and similarity measures is shown to be usable for assessing feature selection methods (or criteria) as such.
Feature selection, feature stability, stability measures, similarity measures, sequential search, individual ranking, feature subset-size optimization, high dimensionality, small sample size.
J. Novovičová and P. Somol, "Evaluating Stability and Comparing Output of Feature Selectors that Optimize Feature Subset Cardinality," in IEEE Transactions on Pattern Analysis & Machine Intelligence, vol. 32, no. , pp. 1921-1939, 2010.