This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Biomarker Identification and Cancer Classification Based on Microarray Data Using Laplace Naive Bayes Model with Mean Shrinkage
Nov.-Dec. 2012 (vol. 9 no. 6)
pp. 1649-1662
Meng-Yun Wu, Dept. of Math., Sun Yat-Sen Univ., Guangzhou, China
Dao-Qing Dai, Dept. of Math., Sun Yat-Sen Univ., Guangzhou, China
Yu Shi, Sch. of Math. & Stat., Zhengzhou Normal Univ., Zhengzhou, China
Hong Yan, Dept. of Electron. Eng., City Univ. of Hong Kong, Kowloon, China
Xiao-Fei Zhang, Dept. of Math., Sun Yat-Sen Univ., Guangzhou, China
Biomarker identification and cancer classification are two closely related problems. In gene expression data sets, the correlation between genes can be high when they share the same biological pathway. Moreover, the gene expression data sets may contain outliers due to either chemical or electrical reasons. A good gene selection method should take group effects into account and be robust to outliers. In this paper, we propose a Laplace naive Bayes model with mean shrinkage (LNB-MS). The Laplace distribution instead of the normal distribution is used as the conditional distribution of the samples for the reasons that it is less sensitive to outliers and has been applied in many fields. The key technique is the L_1 penalty imposed on the mean of each class to achieve automatic feature selection. The objective function of the proposed model is a piecewise linear function with respect to the mean of each class, of which the optimal value can be evaluated at the breakpoints simply. An efficient algorithm is designed to estimate the parameters in the model. A new strategy that uses the number of selected features to control the regularization parameter is introduced. Experimental results on simulated data sets and 17 publicly available cancer data sets attest to the accuracy, sparsity, efficiency, and robustness of the proposed algorithm. Many biomarkers identified with our method have been verified in biochemical or biomedical research. The analysis of biological and functional correlation of the genes based on Gene Ontology (GO) terms shows that the proposed method guarantees the selection of highly correlated genes simultaneously.
Index Terms:
physiological models,biochemistry,bioinformatics,cancer,genetics,lab-on-a-chip,Laplace equations,medical computing,normal distribution,highly correlated genes,biomarker identification,cancer classification,microarray data,mean shrinkage,closely related problems,gene expression data sets,biological pathway,electrical reasons,chemical reasons,gene selection method,Laplace naive Bayes model,Laplace distribution,normal distribution,automatic feature selection,piecewise linear function,regularization parameter,cancer data sets,biochemical research,biomedical research,biological analysis,functional correlation,gene ontology terms,Cancer,Gene expression,Support vector machines,Biological system modeling,Computational modeling,gene expression data analysis,Biomarker identification,cancer classification,Laplace distribution,L_1 penalty
Citation:
Meng-Yun Wu, Dao-Qing Dai, Yu Shi, Hong Yan, Xiao-Fei Zhang, "Biomarker Identification and Cancer Classification Based on Microarray Data Using Laplace Naive Bayes Model with Mean Shrinkage," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 9, no. 6, pp. 1649-1662, Nov.-Dec. 2012, doi:10.1109/TCBB.2012.105
Usage of this product signifies your acceptance of the Terms of Use.