This Article 
 Bibliographic References 
 Add to: 
Identifying Gene Pathways Associated with Cancer Characteristics via Sparse Statistical Methods
July-Aug. 2012 (vol. 9 no. 4)
pp. 966-972
C. Print, Dept. of Mol. Med. & Pathology, Univ. of Auckland, Auckland, New Zealand
R. Yoshida, Dept. of Stat. Modeling, Inst. of Stat. Math., Tokyo, Japan
R. Yamaguchi, Inst. of Med. Sci., Univ. of Tokyo, Tokyo, Japan
M. Nagasaki, Inst. of Med. Sci., Univ. of Tokyo, Tokyo, Japan
S. Imoto, Inst. of Med. Sci., Univ. of Tokyo, Tokyo, Japan
S. Miyano, Inst. of Med. Sci., Univ. of Tokyo, Tokyo, Japan
A. Niida, Inst. of Med. Sci., Univ. of Tokyo, Tokyo, Japan
T. Shimamura, Inst. of Med. Sci., Univ. of Tokyo, Tokyo, Japan
S. Kawano, Dept. of Math. Sci., Osaka Prefecture Univ., Sakai, Japan
We propose a statistical method for uncovering gene pathways that characterize cancer heterogeneity. To incorporate knowledge of the pathways into the model, we define a set of activities of pathways from microarray gene expression data based on the Sparse Probabilistic Principal Component Analysis (SPPCA). A pathway activity logistic regression model is then formulated for cancer phenotype. To select pathway activities related to binary cancer phenotypes, we use the elastic net for the parameter estimation and derive a model selection criterion for selecting tuning parameters included in the model estimation. Our proposed method can also reverse-engineer gene networks based on the identified multiple pathways that enables us to discover novel gene-gene associations relating with the cancer phenotypes. We illustrate the whole process of the proposed method through the analysis of breast cancer gene expression data.

[1] H. Akaike, "A New Look at the Statistical Model Identification," IEEE Trans. Automatic Control, vol. AC-19, no. 6, pp. 716-723, Dec. 1974.
[2] M. Akishita, K. Kozaki, M. Eto, M. Yoshizumi, M. Ishikawa, K. Toba, H. Orimo, and Y. Ouchi, "Estrogen Attenuates Endothelin-1 Production by Bovine Endothelial Cells via Estrogen Receptor," Biochemical Biophysical Research Comm., vol. 251, pp. 17-21, 1998.
[3] S. Ammoun, C. Flaiz, N. Ristic, J. Schuldt, and C.O. Hanemann, "Dissecting and Targeting the Growth Factor-Dependent and Growth Factor-Independent Extracellular Signal-Regulated Kinase Pathway in Human Schwannoma," Cancer Research, vol. 68, pp. 5236-5245, 2008.
[4] C. Archambeau and F. Bach, "Sparse Probabilistic Projections," Advances in Neural Information Processing Systems, vol. 21, pp. 73-80, 2009.
[5] H. Binder and M. Schumacher, "Incorporating Pathway Information into Boosting Estimation of High-Dimensional Risk Prediction Models," BMC Bioinformatics, vol. 10, article 18, 2009.
[6] C.M. Bishop, Pattern Recognition and Machine Learning. Springer, 2006.
[7] E.I. Boyle, S. Weng, J. Gollub, H. Jin, D. Botstein, J.M. Cherry, and G. Sherlock, "GO::TermFinder-Open Source Software for Accessing Gene Ontology Information and Finding Significantly Enriched Gene Ontology Terms Associated with a List of Genes," Bioinformatics, vol. 20, pp. 3710-3715, 2004.
[8] A. Canellada, I. Alvarez, L. Berod, and T. Gentile, "Estrogen and Progesterone Regulate the IL-6 Signal Transduction Pathway in Antibody Secreting Cells," J. Steroid Biochemistry and Molecular Biology, vol. 111, pp. 255-261, 2008.
[9] S. Carascossa, P. Dudek, B. Cenni, P.A. Briand, and D. Picard, "CARM1 Mediates the Ligand-Independent and Tamoxifen-Resistant Activation of the Estrogen Receptor $\alpha$ by cAMP," Genes and Development, vol. 24, pp. 708-719, 2010.
[10] O. Chapelle, B. Schölkopf, and A. Zien, Semi-Supervised Learning. MIT Press, 2006.
[11] P. Dudek and D. Picard, "Genomics of Signaling Crosstalk of Estrogen Receptor $\alpha$ in Breast Cancer Cells," PLoS ONE, vol. 3, p. e1859, 2008.
[12] B. Efron and R. Tibshirani, An Introduction to the Bootstrap. Chapman & Hall, 1993.
[13] W. Feng, P. Webb, P. Nguyen, X. Liu, J. Li, M. Karin, and P.J. Kushner, "Potentiation of Estrogen Receptor Activation Function 1 (AF-1) by Src/JNK through a Serine 118-Independent Pathway," Molecular Endocrinology, vol. 15, pp. 32-45, 2001.
[14] J. Friedman, T. Hastie, and R. Tibshirani, "Regularization Paths Fo Generalized Linear Models via Coordinate Descent," J. Statistical Software, vol. 33, pp. 1-22, 2010.
[15] T. Herdegen and J.D. Leah, "Inducible and Constitutive Transcription Factors in the Mammalian Nervous System: Control of Gene Expression by Jun, Fos and Krox, and CREB/ATF Proteins," Brain Research. Brain Research Rev., vol. 28, pp. 370-490, 1998.
[16] W.E. Johnson, C. Li, and A. Rabinovic, "Adjusting Batch Effects in Microarray Expression Data Using Empirical Bayes Methods," Biostatistics, vol. 8, pp. 118-127, 2007.
[17] M.S. Kim, E.J. Lee, H.R. Kim, and A. Moon, "P38 Kinase is a Key Signaling Molecule for H-Ras-Induced Cell Motility and Invasive Phenotype in Human Breast Epithelial Cells," Cancer Research, vol. 63, pp. 5454-5461, 2003.
[18] A. Kimmelman, T. Tolkacheva, M.V. Lorenzi, M. Osada, and A.M. Chan, "Identification and Characterization of R-Ras3: A Novel Member of the RAS Gene Family with A Non-Ubiquitous Pattern of Tissue Distribution," Oncogene, vol. 15, pp. 2675-2685, 1997.
[19] D.M. Klotz, S.C. Hewitt, P. Ciana, M. Raviscioni, J.K. Lindzey, J. Foley, A. Maggi, R.P. DiAugustine, and K.S. Korach, "Requirement of Estrogen Receptor-Alpha in Insulin-Like Growth Factor-1 (IGF-1)-Induced Uterine Responses and in Vivo Evidence for IGF-1/Estrogen Receptor Cross-Talk," J. Biological Chemistry, vol. 277, pp. 8531-8537, 2002.
[20] D. Koller and N. Friedman, Probabilistic Graphical Models: Principles and Techniques. MIT Press, 2009.
[21] P.S. Legrand, S. Schoonbroodt, and J. Piette, "Regulation of Interleukin-6 Gene Expression by Pro-Inflammatory Cytokines in a Colon Cancer Cell Line," Biochemistry J., vol. 349, pp. 765-773, 2000.
[22] D. Liu, D. Ghosh, and X. Lin, "Estimation and Testing for the Effect of a Genetic Pathway on a Disease Outcome Using Logistic Kernel Machine Regression via Logistic Mixed Models," BMC Bioinformatics, vol. 9, article 292, 2008.
[23] H. Pang and H. Zhao, "Building Pathway Clusters from Random Forests Classification Using Class Votes," BMC Bioinformatics, vol. 9, article 87, 2008.
[24] S. Song and M.A. Black, "Microarray-Based Gene Set Analysis: A Comparison of Current Methods," BMC Bioinformatics, vol. 9, article 502, 2008.
[25] B. Stein and M.X. Yang, "Repression of the Interleukin-6 Promoter by Estrogen Receptor is Mediated by NF-Kappa B and C/EBP Beta," Molecular and Cellular Biology, vol. 15, pp. 4971-4979, 1995.
[26] A. Subramanian, P. Tamayo, V.K. Mootha, S. Mukherjee, B.L. Ebert, M.A. Gillette, A. Paulovich, S.L. Pomeroy, T.R. Golub, E.S. Lander, and J.P. Mesirov, "Gene Set Enrichment Analysis: A Knowledge-Based Approach for Interpreting Genome-Wide Expression Profiles," Proc. Nat'l Academy of Sciences USA, vol. 102, pp. 15545-15550, 2005.
[27] R. Tibshirani, "Regression Shrinkage and Selection via the Lasso," J. Royal Statistical Soc.—Series B, vol. 58, pp. 267-288, 1996.
[28] M.E. Tipping and B.M. Bishop, "Probabilistic Principal Component Analysis," J. Royal Statistical Soc.—Series B, vol. 61, pp. 611-622, 1999.
[29] T. Wang, Y.C. Hu, S. Dong, M. Fan, D. Tamae, M. Ozeki, Q. Gao, D. Gius, and J.J. Li, "Co-Activation of ERK NF-$\kappa$ B, and ${\rm GADD45}{\beta}$ in Response to Ionizing Radiation," J. Biological Chemistry, vol. 280, pp. 12593-12601, 2005.
[30] Z. Wei and H. Li, "Nonparametric Pathway-Based Regression Models for Analysis of Genomic Data," Biostatistics, vol. 8, pp. 265-284, 2007.
[31] M. Wormke, M. Stoner, B. Saville, K. Walker, M. Abdelrahim, R. Burghardt, and S. Safe, "The Aryl Hydrocarbon Receptor Mediates Degradation of Estrogen Receptor $\alpha$ through Activation of Proteasomes," Molecular and Cellular Biology, vol. 23, pp. 1843-1855, 2003.
[32] H. Zou and T. Hastie, "Regularization and Variable Selection via the Elastic Net," J. Royal Statistical Soc.—Series B, vol. 67, pp. 301-320, 2005.
[33] H. Zou, T. Hastie, and R. Tibshirani, "On the Degrees of Freedom of the Lasso," Annals of Statistics, vol. 35, pp. 2173-2192, 2007.

Index Terms:
regression analysis,bioinformatics,biological techniques,biomedical engineering,cancer,data mining,genetics,medical computing,parameter estimation,principal component analysis,breast cancer gene expression data,gene pathway identification,cancer characteristics,sparse statistical methods,cancer heterogeneity,microarray gene expression data,sparse probabilistic PCA,principal component analysis,SPPCA,pathway activity logistic regression model,cancer phenotype,elastic net,parameter estimation,model selection criterion,gene-gene associations,Logistics,Breast cancer,Loading,Gene expression,Supervised learning,Regression analysis,sparse supervised learning.,Cancer heterogeneity,gene network,microarray,pathway activity
C. Print, R. Yoshida, R. Yamaguchi, M. Nagasaki, S. Imoto, S. Miyano, A. Niida, T. Shimamura, S. Kawano, "Identifying Gene Pathways Associated with Cancer Characteristics via Sparse Statistical Methods," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 9, no. 4, pp. 966-972, July-Aug. 2012, doi:10.1109/TCBB.2012.48
Usage of this product signifies your acceptance of the Terms of Use.