This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
2012 IEEE 12th International Conference on Data Mining Workshops
Evaluation of Feature Ranking Ensembles for High-Dimensional Biomedical Data: A Case Study
Brussels, Belgium Belgium
December 10-December 10
ISBN: 978-1-4673-5164-5
Developing accurate, reliable and easy to use diagnostic tests is based upon identifying a small set of highly discriminative biomarkers. This task can be cast as feature selection within a pattern recognition context. Medical data are usually of the "wide" type where the number of features is substantially larger than the number of instances. With the abundance of feature ranking methods, it is difficult to pick the most suitable one and choose a final consistent feature subset. Ensembles of ranking methods have been recommended for the task but their stability and accuracy have not been evaluated across different ranking methods. Here we present a case study consisting of 429 samples of exhaled air from smokers, 83% of whom suffer from COPD (chronic obstructive pulmonary disease). The task is to identify a discriminative subset of the 1929 volatile organic compounds (VOCs) measured through mass spectrometry. Using Pareto analysis, 16 feature ranking ensembles were evaluated with respect to three criteria: classification accuracy, area under the ROC curve and the stability of the ensemble selection. The t-statistic was rated the best among the 16 feature rankers, outperforming the currently favourite SVM ranker.
Index Terms:
Support vector machines,Stability criteria,Accuracy,Indexes,Educational institutions,Vegetation,COPD,Feature selection,feature ranking,classifier ensembles,stability index
Citation:
Ludmila I. Kuncheva, Christopher J. Smith, Yasir Syed, Christopher O. Phillips, Keir E. Lewis, "Evaluation of Feature Ranking Ensembles for High-Dimensional Biomedical Data: A Case Study," icdmw, pp.49-56, 2012 IEEE 12th International Conference on Data Mining Workshops, 2012
Usage of this product signifies your acceptance of the Terms of Use.