DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TCBB.2013.70
Hsi-Che Liu , Mackay Medical College and Division of Pediatric Hematology-Oncology, Mackay Memorial Hospital, New Taipei
Pei-Chen Peng , National Taiwan University, Taipei
Tzung-Chien Hsieh , National Taiwan University, Taipei
Ting-Chi Yeh , Mackay Medical College and Division of Pediatric Hematology-Oncology, Mackay Memorial Hospital, New Taipei
Chih-Jen Lin , National Taiwan University, Taipei
Chien-Yu Chen , National Taiwan University, Taipei
Jen-Yin Hou , Mackay Medical College and Division of Pediatric Hematology-Oncology, Mackay Memorial Hospital, New Taipei
Lee-Yung Shih , Chang Gung Memorial Hospital, Taipei, and Chang Gung University, Taoyuan
Der-Cherng Liang , Mackay Medical College and Division of Pediatric Hematology-Oncology, Mackay Memorial Hospital, New Taipei
The amount of gene expression data of microarray has grown exponentially. To apply them for extensive studies, integrated analysis of cross-laboratory (cross-lab) data becomes a trend and thus choosing an appropriate feature selection method is an essential issue. This paper focuses on feature selection for Affymetrix (Affy) microarray studies across different labs. We investigate four feature selection methods: t-test, Significance Analysis of Microarrays (SAM), Rank Products (RP) and Random Forest (RF). The four methods are conducted in acute lymphoblastic leukemia, acute myeloid leukemia, breast cancer and lung cancer Affy data which consists of three cross-lab data sets individually. We utilize a ranked-based normalization method to reduce the bias from cross-lab data sets. Balanced accuracy and true positive rate are used for prediction evaluation. This study provides comprehensive comparisons of four feature selection methods in cross-lab microarray analysis. Results show that SAM has the best classification performance. RF also gets high classification accuracy, but it is not as stable as SAM. The most naive method is t-test but its performance is the worst among the four methods. In this study, we further discuss the influence of the number of samples and selected genes and the issue of unbalanced data sets.
Cross-laboratory experiment, Microarray data analysis, Feature selection, Cancer
L. Shih et al., "Comparison of Feature Selection Methods for Cross-Laboratory Microarray Analysis," in IEEE/ACM Transactions on Computational Biology and Bioinformatics.