This Article 
 Bibliographic References 
 Add to: 
Jointly Analyzing Gene Expression and Copy Number Data in Breast Cancer Using Data Reduction Models
January-March 2006 (vol. 3 no. 1)
pp. 2-16
With the growing surge of biological measurements, the problem of integrating and analyzing different types of genomic measurements has become an immediate challenge for elucidating events at the molecular level. In order to address the problem of integrating different data types, we present a framework that locates variation patterns in two biological inputs based on the generalized singular value decomposition (GSVD). In this work, we jointly examine gene expression and copy number data and iteratively project the data on different decomposition directions defined by the projection angle \theta in the GSVD. With the proper choice of \theta, we locate similar and dissimilar patterns of variation between both data types. We discuss the properties of our algorithm using simulated data and conduct a case study with biologically verified results. Ultimately, we demonstrate the efficacy of our method on two genome-wide breast cancer studies to identify genes with large variation in expression and copy number across numerous cell line and tumor samples. Our method identifies genes that are statistically significant in both input measurements. The proposed method is useful for a wide variety of joint copy number and expression-based studies. Supplementary information is available online, including software implementations and experimental data.

[1] Int'l Human Genome Sequencing Consortium, “Finishing the Euchromatic Sequence for the Human Genome,” Nature, vol. 431, pp. 931-945, Oct. 2004.
[2] V.E. Velculescu, L. Zhang, B. Vogelstein, and K.W. Kinzler, “Serial Analysis of Gene Expression,” Science, vol. 270, pp. 484-487, Oct. 1995.
[3] D.J. Lockhart, H. Dong, M.C. Byrne, M.T. Follettie, M.V. Gallo, M.S. Chee, M. Mittmann, C. Wang, M. Kobayashi, H. Norton, and E.L. Brown, “Expression Monitoring by Hybridization to High-Density Oligonucleotide Arrays,” Nature Biotechnology, vol. 14, pp. 1675-1680, Dec. 1996.
[4] M. Schena, D. Shalon, R.W. Davis, and P.O. Brown, “Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray,” Science, vol. 270, pp. 467-470, Oct. 1995.
[5] D. Hanahan and R.A. Weinberg, “The Hallmarks of Cancer,” Cell, vol. 100, pp. 57-70, Jan. 2000.
[6] D.J. Lockhart and E.A. Winzeler, “Genomics, Gene Expression and DNA Arrays,” Nature, vol. 405, pp. 827-836, June 2000.
[7] C.M. Perou, T. Sørlie, M.B. Eisen, M. van de Rijn, S.S. Jeffrey, C.A. Rees, J.R. Pollack, D.T. Ross, H. Johnsen, L.A. Akslen, Ø. Fluge, A. Pergamenschikov, C. Williams, S.X. Zhu, P.E. Lønning, A.-L. Børresen-Dale, P.O. Brown, and D. Botstein, “Molecular Portraits of Human Breast Tumours,” Nature, vol. 406, pp. 747-752, Aug. 2000.
[8] S.S. Jeffrey, M.J. Fero, A.-L. Børresen-Dale, and D. Botstein, “Expression Array Technology in the Diagnosis and Treatment of Breast Cancer,” Molecular Interventions, vol. 2, pp. 101-109, Apr. 2002.
[9] F. Forozan, R. Karhu, J. Kononen, A. Kallioniemi, and O.-P. Kallioniemi, “Genome Screening by Comparative Genomic Hybridization,” Trends in Genetics, vol. 13, pp. 405-409, Oct. 1997.
[10] D.G. Albertson, “Profiling Breast Cancer by Array CGH,” Breast Cancer Research and Treatment, vol. 78, pp. 289-298, Apr. 2003.
[11] A.M. Snijders, N. Nowak, R. Segraves, S. Blackwood, N. Brown, J. Conroy, G. Hamilton, A.K. Hindle, B. Huey, K. Kimura, S. Law, K. Myambo, J. Palmer, B. Ylstra, J.P. Yue, J.W. Gray, A.N. Jain, D. Pinkel, and D.G. Albertson, “Assembly of Microarrays for Genome-Wide Measurement of DNA Copy Number,” Nature Genetics, vol. 29, pp. 263-264, Nov. 2001.
[12] J.R. Pollack, C.M. Perou, A.A. Alizadeh, M.B. Eisen, A. Pergamenschikov, C.F. Williams, S.S. Jeffrey, D. Botstein, and P.O. Brown, “Genome-Wide Analysis of DNA Copy-Number Changes Using cDNA Microarrays,” Nature Genetics, vol. 23, pp. 41-46, Sept. 1999.
[13] A.S. Ishkanian, C.A. Malloff, S.K. Watson, R.J. de Leeuw, B. Chi, B.P. Coe, A. Snijders, D.G. Albertson, D. Pinkel, M.A. Marra, V. Ling, C. MacAulay, and W.L. Lam, “A Tiling Resolution DNA Microarray with Complete Coverage of the Human Genome,” Nature Genetics, vol. 36, pp. 299-303, Mar. 2004.
[14] A. Kallioniemi, O.-P. Kallioniemi, D. Sudar, D. Rutovitz, J.W. Gray, F. Waldman, and D. Pinkel, “Comparative Genomic Hybridization for Molecular Cytogenetic Analysis of Solid Tumors,” Science, vol. 258, pp. 818-821, Oct. 1992.
[15] D. Pinkel, R. Segraves, D. Sudar, S. Clark, I. Poole, D. Kowbel, C. Collins, W.-L. Kuo, C. Chen, Y. Zhai, S.H. Dairkee, B. Ljung, J.W. Gray, and D.G. Albertson, “High Resolution Analysis of DNA Copy Number Variation Using Comparative Genomic Hybridization to Microarrays,” Nature Genetics, vol. 20, pp. 207-211, Oct. 1998.
[16] L.W.M. Loo, D.I. Grove, E.M. Williams, C.L. Neal, L.A. Cousens, E.L. Schubert, I.N. Holcomb, H.F. Massa, J. Glogovac, C.I. Li, K.E. Malone, J.R. Daling, J.J. Delrow, B.J. Trask, L. Hsu, and P.L. Porter, “Array Comparative Genomic Hybridization Analysis of Genomic Alterations in Breast Cancer Subtypes,” Cancer Research, vol. 64, pp. 8541-8549, Dec. 2004.
[17] O. Monni, M. Bärlund, S. Mousses, J. Kononen, G. Sauter, M. Heiskanen, P. Paavola, K. Avela, Y. Chen, M.L. Bittner, and A. Kallioniemi, “Comprehensive Copy Number and Gene Expression Profiling of the 17q23 Amplicon in Human Breast Cancer,” Proc. Nat'l Academy of Science USA, vol. 98, pp. 5711-5716, May 2001.
[18] E. Hyman, P. Kauraniemi, S. Hautaniemi, M. Wolf, S. Mousses, E. Rozenblum, M. Ringnér, G. Sauter, O. Monni, A. Elkahloun, O.-P. Kallioniemi, and A. Kallioniemi, “Impact of DNA Amplification on Gene Expression Patterns in Breast Cancer,” Cancer Research, vol. 62, pp. 6240-6245, Nov. 2002.
[19] J.R. Pollack, T. Sørlie, C.M. Perou, C.A. Rees, S.S. Jeffrey, P.E. Lonning, R. Tibshirani, D. Botstein, A.-L. Børresen-Dale, and P.O. Brown, “Microarray Analysis Reveals a Major Direct Role of DNA Copy Number Alteration in the Transcriptional Program of Human Breast Tumors,” Proc. Nat'l Academy of Science USA, vol. 99, pp. 12 963-12 968, Oct. 2002.
[20] P. Hupé, N. Stransky, J.-P. Thiery, F. Radvanyi, and E. Barillot, “Analysis of Array CGH Data: From Signal Ratio to Gain and Loss of DNA Regions,” Bioinformatics, vol. 20, pp. 3413-3422, Dec. 2004.
[21] P. Wang, Y. Kim, J. Pollack, B. Narasimhan, and R. Tibshirani, “A Method for Calling Gains and Losses in Array CGH Data,” Biostatistics, vol. 6, pp. 45-58, Jan. 2005.
[22] S. Hautaniemi, M. Ringnér, P. Kauraniemi, R. Autio, H. Edgren, O. Yli-Harja, J. Astola, A. Kallioniemi, and O.-P. Kallioniemi, “A Strategy for Identifying Putative Causes of Gene Expression Variation in Human Cancers,” J. Franklin Inst., vol. 341, pp. 77-88, Mar. 2004.
[23] T. Hastie, R. Tibshirani, M.B. Eisen, A. Alezadeh, R. Levy, L. Staudt, W.C. Chan, D. Botstein, and P. Brown, “`Gene Shaving' as a Method for Identifying Distinct Sets of Genes with Similar Expression Patterns,” Genome Biology, vol. 1, pp. 00031-0003.21, Aug. 2000.
[24] O. Alter, P.O. Brown, and D. Botstein, “Generalized Singular Value Decomposition for Comparative Analysis of Genome-Scale Expression Data Sets of Two Different Organisms,” Proc. Nat'l Academy of Science USA, vol. 100, pp. 3351-3356, Mar. 2003.
[25] J.A. Berger, S. Hautaniemi, S.K. Mitra, and J. Astola, “Supplementary Webpage: Jointly Analyzing Gene Expression and Copy Number Data in Breast Cancer Using Data Reduction Models,”, 2006.
[26] O. Alter, P.O. Brown, and D. Botstein, “Singular Value Decomposition for Genome-Wide Expression Data Processing and Modeling,” Proc. Nat'l Academy of Science USA, vol. 97, pp. 10 101-10 106, Aug. 2000.
[27] G.H. Golub and C.F. Van Loan, Matrix Computations, third ed. Baltimore, Md.: The Johns Hopkins Univ. Press, 1996.
[28] K.Y. Yeung and W.L. Ruzzo, “Principal Component Analysis for Clustering Gene Expression Data,” Bioinformatics, vol. 17, pp. 763-774, Sept. 2001.
[29] C.F. Van Loan, “Generalizing the Singular Value Decomposition,” SIAM J. Numerical Analysis, vol. 13, pp. 76-83, Mar. 1976.
[30] C.C. Paige and M.A. Saunders, “Towards a Generalized Singular Value Decomposition,” SIAM J. Numerical Analysis, vol. 18, pp. 398-405, June 1981.
[31] J. Quackenbush, “Computational Analysis of Microarray Data,” Nature Reviews Genetics, vol. 2, pp. 418-427, June 2001.
[32] H. Jiang, Y. Deng, H.-S. Chen, L. Tao, Q. Sha, J. Chen, C.-J. Tsai, and S. Zhang, “Joint Analysis of Two Microarray Gene-Expression Data Sets to Select Lung Adenocarcinoma Marker Genes,” BMC Bioinformatics, vol. 5, pp. 1-12, June 2004.
[33] S. Attoor, E.R. Dougherty, Y. Chen, M.L. Bittner, and J.M. Trent, “Which Is Better for cDNA-Microarray-Based Classification: Ratios or Direct Intensities,” Bioinformatics, vol. 20, pp. 2513-2520 Nov. 2004.
[34] T. Sørlie, R. Tibshirani, J. Parker, T. Hastie, J. Marron, A. Nobel, S. Deng, H. Johnsen, R. Pesich, S. Geisler, J. Demeter, C.M. Perou, P.E. Lønning, P.O. Brown, A.-L. Børresen-Dale, and D. Botstein, “Repeated Observation of Breast Tumor Subtypes in Independent Gene Expression Data Sets,” Proc. Nat'l Academy of Science USA, vol. 100, pp. 8418-8423, July 2003.
[35] P. Kauraniemi, S. Hautaniemi, R. Autio, J. Astola, O. Monni, A. Elkahloun, and A. Kallioniemi, “Effects of Herceptin Treatment on Global Gene Expression Patterns in HER2-Amplified and Nonamplified Breast Cancer Cell Lines,” Oncogene, vol. 23, pp. 1010-1013, Jan. 2004.
[36] The Gene Ontology Consortium, “Gene Ontology: Tool for the Unification of Biology,” Nature Genetics, vol. 25, pp. 25-29, May 2000.
[37] B.R. Zeeberg, W. Feng, G. Wang, M.D. Wang, A.T. Fojo, M. Sunshine, S. Narasimhan, D.W. Kane, W.C. Reinhold, S. Lababidi, K.J. Bussey, J. Riss, J.C. Barrett, and J.N. Weinstein, “GoMiner: A Resource for Biological Interpretation of Genomic and Proteomic Data,” Genome Biology, vol. 4, pp. R28.1-8, Mar. 2003.
[38] J.S. Ross and J.A. Fletcher, “The HER-2/neu Oncogene: Prognostic Factor, Predictive Factor and Target for Therapy,” Seminars in Cancer Biology, vol. 9, pp. 125-138, Apr. 1999.
[39] O.G. Troyanskaya, “Putting Microarrays in a Context: Integrated Analysis of Diverse Biological Data,” Briefings in Bioinformatics, vol. 6, pp. 34-43, Mar. 2005.
[40] E. Segal, M. Shapira, A. Regev, D. Pe'er, D. Botstein, D. Koller, and N. Friedman, “Module Networks: Identifying Regulatory Modules and Their Condition-Specific Regulators from Gene Expression Data,” Nature Genetics, vol. 34, pp. 166-176, June 2003.
[41] O.G. Troyanskaya, K. Dolinski, A.B. Owen, R.B. Altman, and D. Botstein, “A Bayesian Framework for Combining Heterogeneous Data Sources for Gene Function Prediction (in Saccharomyces cerevisiae),” Proc. Nat'l Academy of Science USA, vol. 100, pp. 8348-8353, July 2003.

Index Terms:
Generalized singular value decomposition, cDNA microarray data, CGH microarray data, gene expression, DNA copy numbers, breast cancer.
John A. Berger, Sampsa Hautaniemi, Sanjit K. Mitra, Jaakko Astola, "Jointly Analyzing Gene Expression and Copy Number Data in Breast Cancer Using Data Reduction Models," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 3, no. 1, pp. 2-16, Jan.-March 2006, doi:10.1109/TCBB.2006.10
Usage of this product signifies your acceptance of the Terms of Use.