The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.06 - November/December (2011 vol.8)
pp: 1633-1641
Hong-Dong Li , Res. Center of Modernization of Traditional Chinese Medicines, Central South Univ., Changsha, China
Yi-Zeng Liang , Res. Center of Modernization of Traditional Chinese Medicines, Central South Univ., Changsha, China
Qing-Song Xu , Sch. of Math. Sci., Central South Univ., Changsha, China
Dong-Sheng Cao , Res. Center of Modernization of Traditional Chinese Medicines, Central South Univ., Changsha, China
Bin-Bin Tan , Res. Center of Modernization of Traditional Chinese Medicines, Central South Univ., Changsha, China
Bai-Chuan Deng , Res. Center of Modernization of Traditional Chinese Medicines, Central South Univ., Changsha, China
Chen-Chen Lin , Res. Center of Modernization of Traditional Chinese Medicines, Central South Univ., Changsha, China
ABSTRACT
Selecting a small number of informative genes for microarray-based tumor classification is central to cancer prediction and treatment. Based on model population analysis, here we present a new approach, called Margin Influence Analysis (MIA), designed to work with support vector machines (SVM) for selecting informative genes. The rationale for performing margin influence analysis lies in the fact that the margin of support vector machines is an important factor which underlies the generalization performance of SVM models. Briefly, MIA could reveal genes which have statistically significant influence on the margin by using Mann-Whitney U test. The reason for using the Mann-Whitney U test rather than two-sample t test is that Mann-Whitney U test is a nonparametric test method without any distribution-related assumptions and is also a robust method. Using two publicly available cancerous microarray data sets, it is demonstrated that MIA could typically select a small number of margin-influencing genes and further achieves comparable classification accuracy compared to those reported in the literature. The distinguished features and outstanding performance may make MIA a good alternative for gene selection of high dimensional microarray data. (The source code in MATLAB with GNU General Public License Version 2.0 is freely available at http://code.google.eom/p/mia2009/).
INDEX TERMS
Support vector machines, Input variables, Computational modeling, Analytical models, Biological system modeling, Cancer, Predictive models,model population analysis., Informative gene selection, cancer classification, support vector machines, margin
CITATION
Hong-Dong Li, Yi-Zeng Liang, Qing-Song Xu, Dong-Sheng Cao, Bin-Bin Tan, Bai-Chuan Deng, Chen-Chen Lin, "Recipe for uncovering predictive genes using support vector machines based on model population analysis", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.8, no. 6, pp. 1633-1641, November/December 2011, doi:10.1109/TCBB.2011.36
REFERENCES
[1] W. Ma et al., “Support Vector Machine and the Heuristic Method to Predict the Solubility of Hydrocarbons in Electrolyte,” J. Physical Chemistry A, vol. 109, no. 15, pp. 3485-3492, 2005, DOI doi:10.1021/jp0501446.
[2] T. Hastie, R. Tibshirani, D. Botstein, and P. Brown, “Supervised Harvesting of Expression Trees,” Genome Biology, vol. 2, pp. research0003.0001-0003.0012, 2001.
[3] D. Nguyen and D.M. Rocke, “Tumor Classification by Partial Least Squares Using Microarray Gene Expression Data,” Bioinformatics, vol. 18, pp. 39-50, 2002.
[4] S. Dudoit, J. Fridlyand, and T. Speed, “Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data,” J. Am. Statistical Assoc., vol. 97, pp. 77-87, 2002.
[5] Y. Lee and C. Lee, “Classification of Multiple Cancer Types by Multi-Category Support Vector Machines Using Gene Expression Data,” Technical Report 1051, Dept. of Statistics, Univ. of Wisconsin, Madison, WI, 2002.
[6] H. Zou and T. Hastie, “Regularization and Variable Selection via the Elastic Net,” J. Royal Statistical Soc. B, vol. 67, pp. 301-320, 2005.
[7] E. Candes and T. Tao, “The Dantzig Selector: Statistical Estimation when p Is Much Larger than n,” Annals of Statistics, vol. 35, no. 6, pp. 2313-2351, 2007.
[8] Y. Lai, “On the Identification of Differentially Expressed Genes: Improving the Generalized F-Statistics for Affymetrix Microarray Gene Expression Data,” Computational Biology Chemistry, vol. 30, no. 5, pp. 321-326, 2006.
[9] T.R. Golub et al., “Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring,” Science, vol. 286, no. 5439, pp. 531-537, 1999, DOI 10.1126/science.286.5439.531.
[10] S. Ma and J. Huang, “Regularized ROC Method for Disease Classification and Biomarker Selection with Microarray Data,” Bioinformatics, vol. 21, no. 24, pp. 4356-4362, 2005, DOI 10.1093/bioinformatics/bti724.
[11] D. Ghosh and A.M. Chinnaiyan, “Classification and Selection of Biomarkers in Genomic Data Using Lasso,” J. Biomedicine Biotechnology, vol. 2, pp. 147-154, 2005.
[12] R. Tibshirani, “Regression Shrinkage and Selection via the Lasso,” J. Royal Statistical Soc. B, vol. 58, pp. 267-288, 1996.
[13] X. Liu, A. Krishnan, and A. Mondry, “An Entropy-Based Gene Selection Method for Cancer Classification Using Microarray Data,” BMC Bioinformatics, vol. 6, no. 1,article no. 76, 2005.
[14] W.S. Noble, “What is a Support Vector Machine?,” Nature Biotechnology, vol. 24, pp. 1565-1567, 2006.
[15] H.-D. Li, Y.-Z. Liang, and Q.-S. Xu, “Support Vector Machines and Its Applications in Chemistry,” Chemometrics and Intelligent Laboratory Systems, vol. 95, pp. 188 -198, 2009.
[16] Y. Aksu, D.J. Miller, G. Kesidis, and Q.X. Yang, “Margin-Maximizing Feature Elimination Methods for Linear and Nonlinear Kernel-Based Discriminant Functions,” IEEE Trans. Image Processing Neural Networks, vol. 21, no. 5, pp. 701-717, May 2010.
[17] I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, “Gene Selection for Cancer Classification Using Support Vector Machines,” Machine Learning, vol. 46, no. 1, pp. 389-422, 2002.
[18] O. Gualdrón et al., “Variable Selection for Support Vector Machine Based Multisensor Systems,” Sensors Actuators B-Chemical, vol. 122, no. 1, pp. 259-268, 2007.
[19] Y. Aksu, D.J. Miller, and G. Kesidis, “Margin-Based Feature Selection Techniques for Support Vector Machine Classification,” Proc. Int'l Assoc. Pattern Recognition (IAPR) Workshop Cognitive Information Processing, pp. 176-181, 2008.
[20] H.-D. Li, Y.-Z. Liang, Q.-S. Xu, and D.-S. Cao, “Model Population Analysis for Variable Selection,” J. Chemometrics, vol. 24, pp. 418-423, 2010.
[21] H.B. Mann and D.R. Whitney, “On a Test of Whether One of Two Random Variables is Stochastically Larger than the Other,” Annals of Math. Statistics, vol. 18, pp. 50-60, 1947.
[22] C. Cortes and V. Vapnik, “Support-Vector Networks,” Machine Learning, vol. 20, no. 3, pp. 273-297, 1995.
[23] V. Vapnik, The Nature of Statistical Learning Theory, second ed. Springer, 1999.
[24] C. Burges, “A Tutorial on Support Vector Machines for Pattern Recognition,” Data Mining and Knowledge Discovery, vol. 2, pp. 121-167, 1998.
[25] K. Hasegawa and K. Funatsu, “Non-Linear Modeling and Chemical Interpretation with Aid of Support Vector Machine and Regression,” Current Computer-Aided Drug Design, vol. 6, pp. 1-14, 2010.
[26] M. Stone, “Cross-Validatory Choice and Assessment of Statistical Predictions,” J. Royal Statistical Soc. B, vol. 36, pp. 111-147, 1974.
[27] S. Wold, “Cross-Validatory Estimation of the Number of Components in Factor and Principal Component Analysis,” Technometrics, vol. 20, pp. 397-405, 1978.
[28] Q.-S. Xu and Y.-Z. Liang, “Monte Carlo Cross Validation,” Chemometrics and Intelligent Laboratory Systems, vol. 56, no. 1, pp. 1-11, 2001.
[29] U. Alon et al., “Broad Patterns of Gene Expression Revealed by Clustering Analysis of Tumor and Normal Colon Tissues Probed by Oligonucleotide Arrays,” Proc. Nat'l Academy of Sciences USA, vol. 96, no. 12, pp. 6745-6750, 1999.
[30] M. West et al., “Predicting the Clinical Status of Human Breast Cancer by Using Gene Expression Profiles,” Proc. Nat'l Academy of Sciences USA, vol. 98, pp. 11462 -11467, 2001.
[31] R. Spang, C. Blanchette, H. Zuzan, J. Marks, J. Nevins, and M. West, “Prediction and Uncertainty in the Analysis of Gene Expression Profiles,” Proc. German Conf. Bioinformatics (GCB '01), 2001.
[32] T.S. Furey, N. Cristianini, N. Duffy, D.W. Bednarski, M. Schummer, and D. Haussler, “Support Vector Machine Classification and Validation of Cancer Tissue Samples Using Microarray Expression Data,” Bioinformatics, vol. 16, no. 10, pp. 906-914, 2000, DOI 10.1093/bioinformatics/16.10.906.
[33] M. Dettling and P. Buhlmann, “Boosting for Tumor Classification with Gene Expression Data,” Bioinformatics, vol. 19, no. 9, pp. 1061-1069, 2003, DOI 10.1093/bioinformatics/btf867.
40 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool