Issue No. 04 - July-Aug. (2012 vol. 9)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TCBB.2012.48
C. Print , Dept. of Mol. Med. & Pathology, Univ. of Auckland, Auckland, New Zealand
R. Yoshida , Dept. of Stat. Modeling, Inst. of Stat. Math., Tokyo, Japan
R. Yamaguchi , Inst. of Med. Sci., Univ. of Tokyo, Tokyo, Japan
M. Nagasaki , Inst. of Med. Sci., Univ. of Tokyo, Tokyo, Japan
S. Imoto , Inst. of Med. Sci., Univ. of Tokyo, Tokyo, Japan
S. Miyano , Inst. of Med. Sci., Univ. of Tokyo, Tokyo, Japan
A. Niida , Inst. of Med. Sci., Univ. of Tokyo, Tokyo, Japan
T. Shimamura , Inst. of Med. Sci., Univ. of Tokyo, Tokyo, Japan
S. Kawano , Dept. of Math. Sci., Osaka Prefecture Univ., Sakai, Japan
We propose a statistical method for uncovering gene pathways that characterize cancer heterogeneity. To incorporate knowledge of the pathways into the model, we define a set of activities of pathways from microarray gene expression data based on the Sparse Probabilistic Principal Component Analysis (SPPCA). A pathway activity logistic regression model is then formulated for cancer phenotype. To select pathway activities related to binary cancer phenotypes, we use the elastic net for the parameter estimation and derive a model selection criterion for selecting tuning parameters included in the model estimation. Our proposed method can also reverse-engineer gene networks based on the identified multiple pathways that enables us to discover novel gene-gene associations relating with the cancer phenotypes. We illustrate the whole process of the proposed method through the analysis of breast cancer gene expression data.
regression analysis, bioinformatics, biological techniques, biomedical engineering, cancer, data mining, genetics, medical computing, parameter estimation, principal component analysis, breast cancer gene expression data, gene pathway identification, cancer characteristics, sparse statistical methods, cancer heterogeneity, microarray gene expression data, sparse probabilistic PCA, principal component analysis, SPPCA, pathway activity logistic regression model, cancer phenotype, elastic net, parameter estimation, model selection criterion, gene-gene associations, Logistics, Breast cancer, Loading, Gene expression, Supervised learning, Regression analysis, sparse supervised learning., Cancer heterogeneity, gene network, microarray, pathway activity
C. Print et al., "Identifying Gene Pathways Associated with Cancer Characteristics via Sparse Statistical Methods," in IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 9, no. , pp. 966-972, 2012.