This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Multiclass Kernel-Imbedded Gaussian Processes for Microarray Data Analysis
July/August 2011 (vol. 8 no. 4)
pp. 1041-1053
Xin Zhao, Sanjole Inc., Honolulu
Leo Wang-Kit Cheung, Loyola University Medical Center, Maywood
Identifying significant differentially expressed genes of a disease can help understand the disease at the genomic level. A hierarchical statistical model named multiclass kernel-imbedded Gaussian process (mKIGP) is developed under a Bayesian framework for a multiclass classification problem using microarray gene expression data. Specifically, based on a multinomial probit regression setting, an empirically adaptive algorithm with a cascading structure is designed to find appropriate featuring kernels, to discover potentially significant genes, and to make optimal tumor/cancer class predictions. A Gibbs sampler is adopted as the core of the algorithm to perform Bayesian inferences. A prescreening procedure is implemented to alleviate the computational complexity. The simulated examples show that mKIGP performed very close to the Bayesian bound and outperformed the referred state-of-the-art methods in a linear case, a nonlinear case, and a case with a mislabeled training sample. Its usability has great promises to problems that linear-model-based methods become unsatisfactory. The mKIGP was also applied to four published real microarray data sets and it was very effective for identifying significant differentially expressed genes and predicting classes in all of these data sets.

[1] T.R. Golub, D. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J. Mesirov, H. Coller, M. Loh, J. Downing, M. Caligiuri, C. Bloomfield, and E. Lender, “Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring,” Science, vol. 286, pp. 531-537, 1999.
[2] T.R. Golub, “Genome-Wide Views of Cancer,” The New England J. Medicine, vol. 344, pp. 601-602, 2001.
[3] S. Ramaswamy and T.R. Golub, “DNA Microarrays in Clinical Oncology,” J. Clinical Oncology, vol. 20, pp. 1932-1941, 2002.
[4] P. Tamayo and S. Ramaswamy, “Microarray Data Analysis: Cancer Genomics and Molecular Pattern Recognition,” Expression Profiling of Human Tumors: Diagnostic and Research Applications, M. Ladanyi and W. Gerald, eds., Humana Press, 2003.
[5] S. Dudoit, J. Fridlyand, and T. Speed, “Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data,” J. Am. Statistical Assoc., vol. 97, no. 457, pp. 77-87, 2002.
[6] S. Dudoit, J. Shaffer, and J. Boldrick, “Multiple Hypothesis Testing in Microarray Experiments,” Statistical Science, vol. 18, pp. 71-103, 2003.
[7] B. Efron, “Large-Scale Simultaneous Hypothesis Testing: The Choice of a Null Hypothesis,” J. Am. Statistical Assoc., vol. 99, pp. 96-104, 2004.
[8] E. Bair, T. Hastie, D. Paul, and R. Tibshirani, “Prediction by Supervised Principal Component,” J. Am. Statistical Assoc., vol. 101, pp. 119-137, 2006.
[9] R. Tibshirani, T. Hastie, B. Narasimhan, and G. Chu, “Diagnosis of Multiple Cancer Types by Shrunken Centroids of Gene Expression,” Proc. Nat'l Academy of Science USA, vol. 99, pp. 6567-6572, 2002.
[10] I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, “Gene Selection for Cancer Classification Using Support Vector Machines,” Machine Learning, vol. 46, pp. 389-422, 2002.
[11] J. Zhu and T. Hastie, “Classification of Gene Microarrays by Penalized Logistic Regression,” Biostatistics, vol. 5, no. 3, pp. 427-443, 2004.
[12] I. Lönnstedt and T. Britton, “Hierarchical Bayes Models for cDNA Microarray Gene Expression,” Biostatistics, vol. 6, no. 2, pp. 279-291, 2005.
[13] W. Chu, Z. Ghahramani, F. Falciani, and D. Wild, “Biomarker Discovery in Microarray Gene Expression Data with Gaussian Processes,” Bioinformatics, vol. 21, no. 16, pp. 3385-3393, 2005.
[14] K.E. Lee, N. Sha, E.R. Dougherty, M. Vannucci, and B.K. Mallick, “Gene Selection: A Bayesian Variable Selection Approach,” Bioinformatics, vol. 19, no. 1, pp. 90-97, 2003.
[15] X. Zhou, X. Wang, and E.R. Dougherty, “Gene Prediction Using Multinomial Probit Regression with Bayesian Gene Selection,” EURASIP J. Applied Signal Processing, vol. 1, pp. 115-124, 2004.
[16] X. Zhou, K. Liu, and S.T.C. Wong, “Cancer Classification and Prediction Using Logistic Regression with Bayesian Gene Selection,” J. Biomedical Informatics, vol. 37, pp. 249-259, 2004.
[17] N. Pochet, F.D. Smet, J.A.K. Suykens, and B.L.R. Moor, “Systematic Benchmarking of Microarray Data Classification: Assessing the Role of Non-Linearity and Dimensionality Reduction,” Bioinformatics, vol. 20, no. 17, pp. 3185-3195, 2004.
[18] X. Zhou, X. Wang, and E.R. Dougherty, “A Bayesian Approach to Nonlinear Probit Gene Selection and Classification,” J. Franklin Inst., vol. 341, pp. 137-156, 2004.
[19] Y. Lin, “Support Vector Machines and the Bayes Rule in Classification,” Data Mining and Knowledge Discovery, vol. 6, pp. 259-275, 2002.
[20] D.J.C. MacKay, “The Evidence Framework Applied to Classification Networks,” Neural Computation, vol. 4, no. 5, pp. 720-736, 1992.
[21] J.T. Kwok, “The Evidence Framework Applied to Support Vector Machines,” IEEE Trans. Neural Networks, vol. 11, no. 5, pp. 1162-1173, Sept. 2000.
[22] T.V. Gestel, J.V.K. Suykens, G. Lanckriet, A. Lambrechts, B.D. Moor, and J. Vandewalle, “Bayesian Framework for Least-Squares Support Vector Machine Classifiers, Gaussian Processes, and Kernel Fisher Discriminant Analysis,” Neural Computation, vol. 14, no. 5, pp. 1115-1147, 2002.
[23] R.M. Neal, Bayesian Learning for Neural Networks. Springer-Verlag, 1996.
[24] C.E. Rasmussen and C.K.I. Williams, Gaussian Processes for Machine Learning. The MIT Press, 2006.
[25] X. Zhao and L.W.K. Cheung, “A Hierarchical Bayesian Approach with Kernel-Imbedded Gaussian Processes for Microarray Gene Expression Data Analysis,” BMC Bioinformatics, vol. 8, article no. 67, pp. 1-26, 2007.
[26] N. Cristianini and J. Shawe-Tayer, An Introduction to Support Vector Machines. Cambridge Univ. Press, 2000.
[27] A. Gelman, J.B. Carlin, H.S. Stern, and D.B. Rubin, Bayesian Data Analysis, second ed. Chapman & Hall/CRC, 2004.
[28] C. Robert, “Simulation of Truncated Normal Variables,” Statistics and Computing, vol. 5, pp. 121-125, 1995.
[29] M. Dettling, “BagBoosting for Tumor Classification with Gene Expression Data,” Bioinformatics, vol. 20, no. 18, pp. 3583-3593, 2004.
[30] I. Hedenfalk, D. Duggan, Y. Chen, M. Radmacher, M. Bittner, R. Simon, P. Meltzer, B. Gusterson, M. Esteller, and M. Raffeld, “Gene Expression Profiles in Hereditary Breast Cancer,” The New England J. Medicine, vol. 344, pp. 539-548, 2001.
[31] J. Jones et al., “Gene Signatures of Progression and Metastasis in Renal Cell Cancer,” Clinical Cancer Research, vol. 11, no. 16, pp. 5730-5739, 2005.
[32] A. Zangrando, M.C. Dell'orto, G. Te Kronnie, and G. Basso, “MLL Rearrangements in Pediatric Acute Lymphoblastic and Myeloblastic Leukemias: MLL Specific and Lineage Specific Signatures,” BMC Medical Genomics, vol. 2, no. 36, pp. 1-12, June 2009.
[33] D.Y. Chiang et al., “Focal Gains of VEGFA and Molecular Classification of Hepatocellular Carcinoma,” Cancer Research, vol. 68, no. 16, pp. 6779-88, 2008.

Index Terms:
Gene expression, Gaussian processes, Monte Carlo methods, nonlinear multiclass systems.
Citation:
Xin Zhao, Leo Wang-Kit Cheung, "Multiclass Kernel-Imbedded Gaussian Processes for Microarray Data Analysis," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 8, no. 4, pp. 1041-1053, July-Aug. 2011, doi:10.1109/TCBB.2010.85
Usage of this product signifies your acceptance of the Terms of Use.