The Community for Technology Leaders
RSS Icon
Issue No.04 - July/August (2011 vol.8)
pp: 1148-1151
Tim Peters , Macquarie University, Sydney
David W. Bulger , Macquarie University, Sydney
To-Ha Loi , St. Vincent's Hospital and St. Vincent's Centre for Applied Medical Research, Sydney
Jean Yee Hwa Yang , University of Sydney, Sydney
David Ma , St. Vincent's Hospital and St. Vincent's Centre for Applied Medical Research, Sydney
Current feature selection methods for supervised classification of tissue samples from microarray data generally fail to exploit complementary discriminatory power that can be found in sets of features [CHECK END OF SENTENCE]. Using a feature selection method with the computational architecture of the cross-entropy method [CHECK END OF SENTENCE], including an additional preliminary step ensuring a lower bound on the number of times any feature is considered, we show when testing on a human lymph node data set that there are a significant number of genes that perform well when their complementary power is assessed, but "pass under the radar” of popular feature selection methods that only assess genes individually on a given classification tool. We also show that this phenomenon becomes more apparent as diagnostic specificity of the tissue samples analysed increases.
Feature selection, microarray, data mining, genetic interdependence, lymphoma.
Tim Peters, David W. Bulger, To-Ha Loi, Jean Yee Hwa Yang, David Ma, "Two-Step Cross-Entropy Feature Selection for Microarrays—Power Through Complementarity", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.8, no. 4, pp. 1148-1151, July/August 2011, doi:10.1109/TCBB.2011.30
[1] V.M. Aris et al., “Noise Filtering and Nonparametric Analysis of Microarray Data Underscores Discriminating Markers of Oral, Prostate, Lung, Ovarian and Breast Cancer,” BioMed Central Bioinformatics, vol. 5, p. 185, 2004.
[2] D. Berrar et al., “Avoiding Model Selection Bias in Small-Sample Genomic Data Sets,” Bioinformatics, vol. 22, no. 10, pp. 1245-1250, 2006.
[3] L. Breiman, “Random Forests,” Machine Learning, vol. 45, no. 1, pp. 5-32, 2001.
[4] M. Dash and H. Liu, “Feature Selection for Classification,” Proc. Intelligent Data Analysis, 1997.
[5] M. Dettling and P. Buhlmann, “Finding Predictive Gene Groups Form Microarray Data,” J. Multivariate Analysis, vol. 90, pp. 106-131, 2004.
[6] R.A. Fisher, “The Use of Multiple Measurements in Taxonomic Problems,” Annals of Eugenics', vol. 7, pp. 179-188, 1936.
[7] J. Fridlyand, S. Dudoit, and T.P. Speed, “Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data,” J. Am. Statistical Assoc., vol. 97, no. 457, pp. 77-87, 2002.
[8] M.R. Garey and D.S. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness. W.H. Freeman, 1979.
[9] R. Gentleman et al., “Bioconductor: Open Software Development for Computational Biology and Bioinformatics,” Genome Biology, vol. 5, no. 10, p. R80, 2004.
[10] I. Guyon and A. Elisseeff, “An Introduction to Variable and Feature Selection,” J. Machine Learning Research, vol. 3, pp. 1157-1182, 2003.
[11] T. Hastie, R. Tibshirani, and J. Friedman, “The Elements of Statistical Learning,” Springer Series in Statistics, Springer, 2001.
[12] T. Li et al., “A Comparative Study of Feature Selection and Multiclass Classification Methods for Tissue Classification Based on Gene Expression,” Bioinformatics, vol. 20, no. 15, pp. 2429-2437, 2004.
[13] C.C. Liu et al., “Topology-Based Cancer Classification and Related Pathway Mining Using Microarray Data,” Nucleic Acids Research, vol. 34, no. 14, pp. 4069-4080, 2006.
[14] A.Y. Ng, “On Feature Selection: Learning with Exponentially Many Irrelevant Features as Training Examples,” Proc. 15th Int'l Conf. Machine Learning, 1998.
[15] R Development Core Team, “R: A Language and Environment for Statistical Computing,” R Foundation for Statistical Computing, http:/, 2007.
[16] R.Y. Rubenstein and D.P. Kroese, “The Cross Entropy Method : A Unified Approach to Combinatorial Optimization,” Monte-Carlo Simluation, and Machine Learning, Springer, 2004.
[17] M. Sewell, “Feature Selection,” feature-selection.pdf, 2007.
[18] R.L. Somorjai et al., “Class Prediction and Discovery Using Gene Microarray and Proteomics Mass Spectroscopy Data: Curses, Caveats, Cautions,” Bioinformatics, vol. 19, no. 12, pp. 1484-1491, 2003.
[19] A. Statnikov et al., “A Comprehensive Evaluation of Multicategory Classification Methods for Microarray Gene Expression Cancer Diagnosis,” Bioinformatics, vol. 21, no. 5, pp. 631-643, 2005.
[20] A. Subramanian et al., “Gene Set Enrichment Analysis: A Knowledge-Based Approach for Interpreting Genome-Wide Expression Profiles,” Proc. Nat'l Academy of Sciences USA, vol. 102, no. 43, pp. 15545-15550, 2005.
[21] R. Tibshirani, “Regression Shrinkage and Selection via the Lasso,” J. Royal Statistical Soc. Series B, vol. 58, no. 1, pp. 267-288, 1996.
[22] R. Tibshirani et al., “Diagnosis of Multiple Cancer Types by Shrunken Centroids of Gene Expression,” Proc. Nat'l Academy of Sciences USA, vol. 99, no. 10, pp. 6567-6572, 2002.
[23] E.P. Xing and R.M. Karp, “CLIFF: Clustering of High-Dimensional Microarray Data via Iterative Feature Filtering Using Normalized Cuts,” Bioinformatics, vol. 17, Suppl. 1, pp. S306-S315, 2001.
[24] H. Yoon, K. Yang, and C. Shahabi, “Feature Subset Selection and Feature Ranking for Multivariate Time Series,” IEEE Trans. Knowledge and Data Eng., vol. 17, no. 9, pp. 1186-1198, Sept. 2005.
12 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool