The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.01 - Jan.-Feb. (2013 vol.10)
pp: 230-235
Xiaowei Zhou , Dept. of Electron. & Comput. Eng., Hong Kong Univ. of Sci. & Technol., Hong Kong, China
Can Yang , Dept. of Biostat., Yale Univ., New Haven, CT, USA
Xiang Wan , Dept. of Comput. Sci., Hong Kong Baptist Univ., Hong Kong, China
Hongyu Zhao , Dept. of Biostat., Yale Univ., New Haven, CT, USA
Weichuan Yu , Dept. of Electron. & Comput. Eng., Hong Kong Univ. of Sci. & Technol., Hong Kong, China
ABSTRACT
DNA copy number variation (CNV) accounts for a large proportion of genetic variation. One commonly used approach to detecting CNVs is array-based comparative genomic hybridization (aCGH). Although many methods have been proposed to analyze aCGH data, it is not clear how to combine information from multiple samples to improve CNV detection. In this paper, we propose to use a matrix to approximate the multisample aCGH data and minimize the total variation of each sample as well as the nuclear norm of the whole matrix. In this way, we can make use of the smoothness property of each sample and the correlation among multiple samples simultaneously in a convex optimization framework. We also developed an efficient and scalable algorithm to handle large-scale data. Experiments demonstrate that the proposed method outperforms the state-of-the-art techniques under a wide range of scenarios and it is capable of processing large data sets with millions of probes.
INDEX TERMS
Spectral analysis, Optimization, Convex optimization,convex optimization, CNV, aCGH, total variation, spectral regularization
CITATION
Xiaowei Zhou, Can Yang, Xiang Wan, Hongyu Zhao, Weichuan Yu, "Multisample aCGH Data Analysis via Total Variation and Spectral Regularization", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.10, no. 1, pp. 230-235, Jan.-Feb. 2013, doi:10.1109/TCBB.2012.166
REFERENCES
[1] R. Redon et al., “Global Variation in Copy Number in the Human Genome,” Nature, vol. 444, no. 7118, pp. 444-454, 2006.
[2] L. Feuk, A. Carson, and S. Scherer, “Structural Variation in the Human Genome,” Nature Rev. Genetics, vol. 7, no. 2, pp. 85-97, 2006.
[3] E. Gonzalez et al., “The Influence of CCL3L1 Gene-Containing Segmental Duplications on HIV-1/AIDS Susceptibility,” Science, vol. 307, no. 5714, pp. 1434-1440, 2005.
[4] D. Pinkel and D. Albertson, “Array Comparative Genomic Hybridization and Its Applications in Cancer,” Nature Genetics, vol. 37, pp. S11-S17, 2005.
[5] A. Olshen, E. Venkatraman, R. Lucito, and M. Wigler, “Circular Binary Segmentation for the Analysis of Array-Based DNA Copy Number Data,” Biostatistics, vol. 5, no. 4, pp. 557-572, 2004.
[6] F. Picard, S. Robin, M. Lavielle, C. Vaisse, and J. Daudin, “A Statistical Approach for Array CGH Data Analysis,” BMC Bioinformatics, vol. 6, no. 1, article 27, 2005.
[7] P. Rancoita, M. Hutter, F. Bertoni, and I. Kwee, “Bayesian DNA Copy Number Analysis,” BMC Bioinformatics, vol. 10, no. 1, p. 10, 2009.
[8] P. Hupé, N. Stransky, J. Thiery, F. Radvanyi, and E. Barillot, “Analysis of Array CGH Data: From Signal Ratio to Gain and Loss of DNA Regions,” Bioinformatics, vol. 20, no. 18, pp. 3413-3422, 2004.
[9] E. Ben-Yaacov and Y. Eldar, “A Fast and Flexible Method for the Segmentation of aCGH Data,” Bioinformatics, vol. 24, no. 16, pp. i139-i145, 2008.
[10] R. Tibshirani and P. Wang, “Spatial Smoothing and Hot Spot Detection for CGH Data Using the Fused Lasso,” Biostatistics, vol. 9, no. 1, pp. 18-29, 2008.
[11] J. Marioni, N. Thorne, and S. Tavaré, “BioHMM: A Heterogeneous Hidden Markov Model for Segmenting Array CGH Data,” Bioinformatics, vol. 22, no. 9, pp. 1144-1146, 2006.
[12] S. Stjernqvist, T. Rydén, M. Sköld, and J. Staaf, “Continuous-Index Hidden Markov Modelling of Array CGH Copy Number Data,” Bioinformatics, vol. 23, no. 8, pp. 1006-1014, 2007.
[13] B. Nilsson, M. Johansson, F. Al-Shahrour, A. Carpenter, and B. Ebert, “Ultrasome: Efficient Aberration Caller for Copy Number Studies of Ultra-High Resolution,” Bioinformatics, vol. 25, no. 8, pp. 1078-1079, 2009.
[14] S. Morganella, L. Cerulo, G. Viglietto, and M. Ceccarelli, “VEGA: Variational Segmentation for Copy Number Detection,” Bioinformatics, vol. 26, no. 24, pp. 3020-3027, 2010.
[15] W. Lai, M. Johnson, R. Kucherlapati, and P. Park, “Comparative Analysis of Algorithms for Identifying Amplifications and Deletions in Array CGH Data,” Bioinformatics, vol. 21, no. 19, pp. 3763-3770, 2005.
[16] H. Willenbrock and J. Fridlyand, “A Comparison Study: Applying Segmentation to Array CGH Data for Downstream Analyses,” Bioinformatics, vol. 21, no. 22, pp. 4084-4091, 2005.
[17] R. Pique-Regi, A. Ortega, and S. Asgharzadeh, “Joint Estimation of Copy Number Variation and Reference Intensities on Multiple DNA Arrays Using GADA,” Bioinformatics, vol. 25, no. 10, pp. 1223-1230, 2009.
[18] M. Van De Wiel, R. Brosens, P. Eilers, C. Kumps, G. Meijer, B. Menten, E. Sistermans, F. Speleman, M. Timmerman, and B. Ylstra, “Smoothing Waves in Array CGH Tumor Profiles,” Bioinformatics, vol. 25, no. 9, pp. 1099-1104, 2009.
[19] F. Picard, E. Lebarbier, M. Hoebeke, G. Rigaill, B. Thiam, and S. Robin, “Joint Segmentation, Calling, and Normalization of Multiple CGH Profiles,” Biostatistics, vol. 12, no. 3, pp. 413-428, 2011.
[20] S. Diskin, T. Eck, J. Greshock, Y. Mosse, T. Naylor, C. StoeckertJr., B. Weber, J. Maris, and G. Grant, “STAC: A Method for Testing the Significance of DNA Copy Number Aberrations Across Multiple Array-CGH Experiments,” Genome Research, vol. 16, no. 9, pp. 1149-1158, 2006.
[21] M. Guttman, C. Mies, K. Dudycz-Sulicz, S. Diskin, D. Baldwin, C. Stoeckert, and G. Grant, “Assessing the Significance of Conserved Genomic Aberrations Using High Resolution Genomic Microarrays,” PLoS Genetics, vol. 3, no. 8, p. e143, 2007.
[22] R. Beroukhim et al., “Assessing the Significance of Chromosomal Aberrations in Cancer: Methodology and Application to Glioma,” Proc. Nat'l Academy of Sciences of USA, vol. 104, no. 50, p. 20007, 2007.
[23] S. Shah, W. Lam, R. Ng, and K. Murphy, “Modeling Recurrent DNA Copy Number Alterations in Array CGH Data,” Bioinformatics, vol. 23, no. 13, pp. i450-i458, 2007.
[24] N. Zhang, D. Siegmund, H. Ji, and J. Li, “Detecting Simultaneous Changepoints in Multiple Sequences,” Biometrika, vol. 97, no. 3, pp. 631-645, 2010.
[25] Q. Zhang et al., “Cmds: A Population-Based Method for Identifying Recurrent DNA Copy Number Aberrations in Cancer from High-Resolution Data,” Bioinformatics, vol. 26, no. 4, pp. 464-469, 2010.
[26] G. Nowak, T. Hastie, J. Pollack, and R. Tibshirani, “A Fused Lasso Latent Feature Model for Analyzing Multi-Sample a CGH Data,” Biostatistics, vol. 12, no. 4, pp. 776-791, 2011.
[27] A. Rinaldo, “Properties and Refinements of the Fused Lasso,” Annals of Statistics, vol. 37, no. 5B, pp. 2922-2952, 2009.
[28] B. Recht, M. Fazel, and P. Parrilo, “Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization,” SIAM Rev., vol. 52, no. 3, pp. 471-501, 2010.
[29] R. Mazumder, T. Hastie, and R. Tibshirani, “Spectral Regularization Algorithms for Learning Large Incomplete Matrices,” J. Machine Learning Research, vol. 11, pp. 2287-2322, 2010.
[30] S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge Univ. Press, 2004.
[31] J. Cai, E. Candès, and Z. Shen, “A Singular Value Thresholding Algorithm for Matrix Completion,” SIAM J. Optimization, vol. 20, pp. 1956-1982, 2010.
[32] S. Boyd, “Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers,” Foundations and Trends in Machine Learning, vol. 3, no. 1, pp. 1-122, 2010.
[33] J. Liu, L. Yuan, and J. Ye, “An Efficient Algorithm for a Class of Fused Lasso Problems,” Proc. ACM SIGKDD 16th Int'l Conf. Knowledge Discovery and Data Mining, pp. 323-332, 2010.
[34] P. Meer, D. Mintz, A. Rosenfeld, and D. Kim, “Robust Regression Methods for Computer Vision: A Review,” Int'l J. Computer Vision, vol. 6, no. 1, pp. 59-70, 1991.
[35] Y. Benjamini and Y. Hochberg, “Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing,” J. Royal Statistical Soc., Series B (Methodological), vol. 57, pp. 289-300, 1995.
[36] P. Zhao and B. Yu, “On Model Selection Consistency of Lasso,” J. Mach. Learn. Res, vol. 7, pp. 2541-2563, 2006.
[37] O. Rueda and R. Diaz-Uriarte, “Finding Recurrent Copy Number Alteration Regions: A Review of Methods,” Current Bioinformatics, vol. 5, no. 1, pp. 1-17, 2010.
[38] J. Pollack, T. Sørlie, C. Perou, C. Rees, S. Jeffrey, P. Lonning, R. Tibshirani, D. Botstein, A. Børresen-Dale, and P. Brown, “Microarray Analysis Reveals a Major Direct Role of DNA Copy Number Alteration in the Transcriptional Program of Human Breast Tumors,” Proc. Nat'l Academy of Sciences USA, vol. 99, no. 20, pp. 12963-12968, 2002.
[39] K. Chin et al., “Genomic and Transcriptional Aberrations Linked to Breast Cancer Pathophysiologies,” Cancer Cell, vol. 10, no. 6, pp. 529-541, 2006.
[40] J. Kaminker, Y. Zhang, C. Watanabe, and Z. Zhang, “CanPredict: A Computational Tool for Predicting Cancer-Associated Missense Mutations,” Nucleic Acids Research, vol. 35, no. suppl 2, pp. W595-W598, 2007.
[41] L. Tran, B. Zhang, Z. Zhang, C. Zhang, T. Xie, J. Lamb, H. Dai, E. Schadt, and J. Zhu, “Inferring Causal Genomic Alterations in Breast Cancer Using Gene Expression Data,” BMC Systems Biology, vol. 5, no. 1, article 121, 2011.
65 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool