This Article 
 Bibliographic References 
 Add to: 
A Hybrid Approach to Survival Model Building Using Integration of Clinical and Molecular Information in Censored Data
July-Aug. 2012 (vol. 9 no. 4)
pp. 1091-1105
M. W. Kattan, Dept. of Quantitative Health Sci., Cleveland Clinic Found., Cleveland, OH, USA
Ickwon Choi, Dept. of Electr. Eng. & Comput. Sci., Case Western Reserve Univ., Cleveland, OH, USA
B. J. Wells, Dept. of Quantitative Health Sci., Cleveland Clinic Found., Cleveland, OH, USA
Changhong Yu, Dept. of Quantitative Health Sci., Cleveland Clinic Found., Cleveland, OH, USA
In medical society, the prognostic models, which use clinicopathologic features and predict prognosis after a certain treatment, have been externally validated and used in practice. In recent years, most research has focused on high dimensional genomic data and small sample sizes. Since clinically similar but molecularly heterogeneous tumors may produce different clinical outcomes, the combination of clinical and genomic information, which may be complementary, is crucial to improve the quality of prognostic predictions. However, there is a lack of an integrating scheme for clinic-genomic models due to the P ≫ N problem, in particular, for a parsimonious model. We propose a methodology to build a reduced yet accurate integrative model using a hybrid approach based on the Cox regression model, which uses several dimension reduction techniques, L2 penalized maximum likelihood estimation (PMLE), and resampling methods to tackle the problem. The predictive accuracy of the modeling approach is assessed by several metrics via an independent and thorough scheme to compare competing methods. In breast cancer data studies on a metastasis and death event, we show that the proposed methodology can improve prediction accuracy and build a final model with a hybrid signature that is parsimonious when integrating both types of variables.

[1] E. Bair, T. Hastie, D. Paul, and R. Tibshirani, "Prediction by Supervised Principal Components," J. Am. Statistical Assoc., vol. 101, pp. 119-137, 2006.
[2] H.M. Bøvelstad, S. Nygård, H.L. Størvold, M. Aldrin, Ø. Bolgan, A. Frigessi, and O.C. Lingærde, "Predicting Survival from Microarray Data—A Comparative Study," Bioinformatics, vol. 23, pp. 2080-2087, 2007.
[3] H.M. Bøvelstad, S. Nygård, and Ø. Bolgan, "Survival Prediction from Clinic-Genomic Models—A Comparative Study," BMC Bioinformatics, vol. 10, no. 1,article 413, 2009.
[4] D.R. Cox, "Regression Models and Life Tables (with Discussion)," J. Royal Statistical Soc. Series B, vol. 34, pp. 187-220, 1972.
[5] H.Y. Chang, D.S.A. Nuyten, J.B. Sneddon, T. Hastie, R. Tibshirani, T. Sorlie, H. Dai, Y.D. He, L.J. van't Veer, H. Bartelink, M. Rijn, P.O. Brown, and M.J. Vijver, "Robustness, Scalability, and Integration of a Wound-Response Gene Expression Signature in Predicting Breast Cancer Survival," Proc. Nat'l Academy of Sciences USA, vol. 102, no. 10, pp. 3738-3343, 2005.
[6] I. Choi, B.J. Wells, M.W. Kattan, and C. Yu, "An Empirical Approach to Model Selection through Validation for Censored Survival Data," J. Biomedical Informatics, vol. 44, no. 4, pp. 595-606, Aug. 2011.
[7] A. Daemen, "Integration of Clinical and Microarray Data with Kernel Methods," Proc. IEEE 29th Ann. Int'l Conf. Eng. in Medicine and Biology Soc., pp. 5411-5415, 2007.
[8] J.J. Goeman, J. Oosting, A.M. Cleton-Jansen, J.K. Anninga, and J.J. van Houwelingen, "Testing Association of a Pathway with Survival Using Gene Expression Data," Bioinformatics, vol. 21, no. 9, pp. 1950-1957, 2005.
[9] E.M. Goldblatt and W.H. Lee, "From Bench to Bedside: the Growing Use of Translational Research in Cancer Medicine," Am. J. Translational Research, vol. 2, no. 1, pp. 1-18, 2010.
[10] I. Guyon and A. Elisseeff, "An Introduction to Variable and Feature Selection," J. Machine Learning Research, vol. 3, pp. 1157-1182, 2004.
[11] O. Gevaert et al., "Predicting the Prognosis of Breast Cancer by Integrating Clinical and Microarray Data with Bayesian Networks," Bioinformatics, vol. 22, pp. 184-190, 2006.
[12] P.J. Heagerty, T. Lumley, and M.S. Pepe, "Time Dependent ROC Curves for Censored Survival Data and a Diagnostic Marker," Biometrics, vol. 56, pp. 337-344, 2005.
[13] F.E. Harrell, K.L. Lee, and D.B. Mark, "Multivariable Prognostic Models: Issues in Developing Models, Evaluating Assumptions and Adequacy, and Measuring and Reducing Errors," Statistics in Medicine, vol. 15, pp. 361-387, 1996.
[14] M.W. Kattan, T.M. Wheeler, and P.T. Scardino, "Postoperative Nomogram for Disease Recurrence After Radical Prostatectomy for Prostate Cancer," J. Clinical Oncology, vol. 17, no. 5, pp. 1499-1507, 1999.
[15] M.W. Kattan, A.J. Vickers, C. Yu, F.J. Bianco, A.M. Cronin, J.A. Eastham, E.A. Klein, A.M. Reuther, J.E. Pontes, and P.T. Scardino, "Preoperative and Postoperative Nomograms Incorporating Surgeon Experience for Localized Prostate Cancer," Cancer, vol. 115, no. 5, pp. 1005-1010, 2009.
[16] L. Li, "Survival Prediction of Diffuse Large-B-Cell Lymphoma Based on Both Clinical and Gene Expression Information," Bioinformatics, vol. 22, pp. 466-471, 2006.
[17] S. Matsui, "Predicting Survival Outcomes Using Subsets of Significant Genes in Prognostic Marker Studies with Microarrays," BMC Bioinformatics, vol. 7, article 156, 2006.
[18] D.V. Nguyen and D.M. Rocke, "Partial Least Squares Proportional Hazard Regression for Application to DNA Microarray Survival Data," Bioinformatics, vol. 18, pp. 1625-1632, 2002.
[19] S. Nygård et al., "Partial Least Squares Cox Regression on Genomic Data Handling Additional Covariates," Statistical Research Report 5/2006. Dept. of Math., Univ. of Oslo, 2006.
[20] M.P. Park and T. Hastie, "L1 Regularization Path Algorithm for Generalized Linear Models," technical report, Dept. of Statistics, Stanford Univ., 2006.
[21] P. Peduzzi et al., "Importance of Events Per Independent Variable in Proportional Hazards Regression Analysis," II, Accuracy and Precision of Regression Estimates, J. Clinical Epidemiology, vol. 48, pp. 1503-1510, 1995.
[22] Y. Saeys, I. Inza, and P. Larrañaga, "A Review of Feature Selection Techniques in Bioinformatics," Bioinformatics, vol. 23, pp. 2507-2517, 2007.
[23] M. Schumacher, H. Binder, and T. Gerds, "Assessment of Survival Prediction Models Based on Microarray Data," Bioinformatics, vol. 23, pp. 1768-1774, 2007.
[24] A.J. Stephenson et al., "Integration of Gene Expression Profiling and Clinical Variables to Predict Prostate Carcinoma Recurrence After Radical Prostatectomy," Cancer, vol. 104, pp. 290-298, 2005.
[25] E.W. Steyerberg, A.J. Vickers, N.R. Cook, T. Gerds, M. Gonen, N. Obuchowski, M.J. Pencina, and M.W. Kattan, "Assessing the Performance of Prediction Models: A Framework for Traditional and Novel Measures," Epidemiology, vol. 21, no. 1, pp. 128-138, 2010.
[26] Y. Sun, S. Goodison, J. Li, L. Liu, and W. Farmerie, "Improved Breast Cancer Prognosist through the Combination of Clinical and Genetic Markers," Bioinformatics, vol. 23, pp. 30-37, 2007.
[27] O. Troyanskaya, M. Cantor, G. Sherlock, P. Brown, T. Hastie, R. Tibshirani, D. Botstein, and R.B. Altman, "Missing Value Estimation Methods for DNA Microarrays," Bioinformatics, vol. 17, pp. 520-525, 2001.
[28] M.J. van de Vijver et al., "A Gene-Expression Signature as a Predictor of Survival in Breast Cancer," New England J. Medicine, vol. 347, pp. 1999-2009, 2002.
[29] H.C. Van Houwelingen, T. Bruinsma, A.A. Hart, L.J. Van't Veer, and L.F. Wessels, "Cross-Validated Cox Regression on Microarray Gene Expression Data," Statistics in Medicine, vol. 25, pp. 3201-3216, 2006.
[30] L.J. van't Veer et al., "Gene Expression Profiling Predicts Clinical Outcome of Breast Cancer," Nature, vol. 415, pp. 530-536, 2002.
[31] E. Vittinghoff and C.E. McCulloch, "Relaxing the Rule of Ten Events per Variable in Logistic and Cox Regression," Am. J. Epidemiology, vol. 165, pp. 710-718, 2006.
[32] F. Wilcoxon, "Individual Comparisons by Ranking Method," Biometrics, vol. 1, pp. 80-83, 1945.
[33] B. Rachet et al., "Population-based Cancer Survival Trends in England and Wales up to 2007: An Assessment of the NHS Cancer Plan for England," The Lancet Oncology, vol. 10, pp. 351-369, 2009.
[34] C.W. Elston, I.O. Ellis, and S.E. Pinder, "Pathological Prognostic Factors in Breast Cancer," Critical Rev. Oncology Hematology, vol. 31, pp. 209-223, 1999.
[35] J.M. Lewis, P.J. Wiebe, and J.G. Heathcote, "Expression of Progesterone Metabolizing Enzyme Genes (AKR1C1, AKR1C2, AKR1C3, SRD5A1,SRD5A2) Is Altered in Human Breast Carcinoma," BMC Cancer, vol. 4, article 27, 2004.
[36] S.K. Rayala et al., "Essential Role of KIBRA in Co-activator Function of Dynein Light Chain 1 in Mammalian Cells," J. Biological Chemistry, vol. 281, pp. 19092-19099, 2006.
[37] G.M. Naqaraja et al., "Gene Expression Signatures and Biomarkers of Noninvasive and Invasive Breast Cancer Cells: Comprehensive Profiles by Representational Difference Analysis, Microarrays and Proteomics," Oncogene, vol. 25, pp. 2329-2338, 2006.
[38] C.G. Fedele et al., "Inositol Polyphosphate 4-Phosphatase II Regulates PI3K/Akt Signaling and Is Lost in Human Basal-like Breast Cancers," Proc. Nat'l Academy of Sciences USA, vol. 107, pp. 22231-22236, 2010.
[39] S.W. Luoh et al., "Overexpression of the Amplified Pip4k2beta Gene from 17q11-12 in Breast Cancer Cells Confers Proliferation Advantage," Oncogene, vol. 23, pp. 1354-1363, 2004.
[40] V.P. Pai et al., "Altered Serotonin Physiology in Human Breast Cancers favors Paradoxical Growth and Cell Survival," Breast Cancer Research, vol. 11:R81, 2009.
[41] Z. Zhang et al., "NCOR1 mRNA Is an Independent Prognostic Factor for Breast Cancer," Cancer Letters, vol. 237, pp. 123-129, 2006.
[42] C. Sotiriou et al., "Breast Cancer Classification and Prognosis Based on Gene Expression Profiles from a Population-Based Study," Proc. Nat'l Academy of Sciences USA, vol. 100, pp. 10393-10398, 2003.
[43] A.E. van Herwaarden, "The Breast Cancer Resistance Protein (Bcrp1/Abcg2) Restricts Exposure to the Dietary Carcinogen 2-Amino-1-Methyl-6-Phenylimidazo4,5-b]Pyridine," Cancer Research, vol. 63, pp. 6447-6452, 2003.
[44] Z. Zhou et al., "Genetic Variations in GJA3, GJA8, LIM2, and Age-Related Cataract in the Chinese Population: A Mutation Screening Study," Molecular Vision, vol. 17, pp. 621-626, 2011.
[45] A. Naderi et al., "A Gene-Expression Signature to Predict Survival in Breast Cancer across Independent Data Sets," Oncogene, vol. 26, pp. 1507-1516, 2007.
[46] I. Van der Auwera et al., "Integrated miRNA and mRNA Expression Profiling of the Inflammatory Breast Cancer Subtype," British J. Cancer, vol. 103, pp. 532-41, 2010.
[47] L.M. Nicole et al., "EN2 Is a Candidate Oncogene in Human Breast Cancer," Oncogene, vol. 24, pp. 6890-6901, 2005.

Index Terms:
tumours,bioinformatics,cancer,data analysis,genomics,maximum likelihood estimation,medical computing,modelling,patient treatment,regression analysis,sampling methods,death event,survival model building,clinical information,molecular information,censored data,prognostic models,clinicopathologic features,prognosis prediction,high dimensional genomic data,molecularly heterogeneous tumors,genomic information,clinical-genomic models,parsimonious model,Cox regression model,dimension reduction techniques,L2 penalized maximum likelihood estimation,PMLE,resampling methods,breast cancer data,breast cancer metastasis,Data models,Bioinformatics,Predictive models,Computational modeling,Feature extraction,Genomics,Indexes,data integration.,Prognostic prediction model,dimension reduction,Clinico-genomic information,censored time to event data,feature selection,Cox model
M. W. Kattan, Ickwon Choi, B. J. Wells, Changhong Yu, "A Hybrid Approach to Survival Model Building Using Integration of Clinical and Molecular Information in Censored Data," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 9, no. 4, pp. 1091-1105, July-Aug. 2012, doi:10.1109/TCBB.2012.31
Usage of this product signifies your acceptance of the Terms of Use.