The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.02 - March-April (2013 vol.10)
pp: 372-382
Jiaoyun Yang , Anhui Province & the Sch. of Comput. Sci., Univ. of Sci. & Technol. of China, Hefei, China
Yun Xu , Anhui Province & the Sch. of Comput. Sci., Univ. of Sci. & Technol. of China, Hefei, China
Xiaohui Yao , Anhui Province & the Sch. of Comput. Sci., Univ. of Sci. & Technol. of China, Hefei, China
Guoliang Chen , Anhui Province & the Sch. of Comput. Sci., Univ. of Sci. & Technol. of China, Hefei, China
ABSTRACT
An enormous amount of sequence data has been generated with the development of new DNA sequencing technologies, which presents great challenges for computational biology problems such as haplotype phasing. Although arduous efforts have been made to address this problem, the current methods still cannot efficiently deal with the incoming flood of large-scale data. In this paper, we propose a flow network model to tackle haplotype phasing problem, and explain some classical haplotype phasing rules based on this model. By incorporating the heuristic knowledge obtained from these classical rules, we design an algorithm FNphasing based on the flow network model. Theoretically, the time complexity of our algorithm is O (n2m+m2), which is better than that of 2SNP, one of the most efficient algorithms currently. After testing the performance of FNphasing with several simulated data sets, the experimental results show that when applied on large-scale data sets, our algorithm is significantly faster than the state-of-the-art Beagle algorithm. FNphasing also achieves an equal or superior accuracy compared with other approaches.
INDEX TERMS
Hidden Markov models, Algorithm design and analysis, Accuracy, Phylogeny, Computational biology, Bioinformatics, Data models,flow network, Hidden Markov models, Algorithm design and analysis, Accuracy, Phylogeny, Computational biology, Bioinformatics, Data models, heuristic methods, Haplotype phasing
CITATION
Jiaoyun Yang, Yun Xu, Xiaohui Yao, Guoliang Chen, "FNphasing: A Novel Fast Heuristic Algorithm for Haplotype Phasing Based on Flow Network Model", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.10, no. 2, pp. 372-382, March-April 2013, doi:10.1109/TCBB.2013.18
REFERENCES
[1] The Celera Genomics Sequencing Team, "The Sequence of the Human Genome," Science, vol. 291, pp. 1304-1351, 2001.
[2] The Int'l Human Genome Mapping Consortium, "A Physical Map of the Human Genome," Nature, vol. 409, pp. 934-941, 2001.
[3] W.H. Li and L.A. Sadler, "Low Nucleotide Diversity in Man," Genetics, vol. 129, pp. 513-523, 1991.
[4] D.G. Wang, J.B. Fan, C.J. Siao, A. Berno, P. Young, R. Sapolsky, G. Ghandour, N. Perkins, E. Winchester, J. Spencer, L. Kruglyak, L. Stein, L. Hsie, T. Topaloglou, E. Hubbell, E. Robinson, M. Mittmann, M.S. Morris, N. Shen, D. Kilburn, J. Rioux, C. Nusbaum, S. Rozen, T.J. Hudson, R. Lipshutz, M. Chee, and E.S. Lander, "Large-Scale Identification, Mapping, and Genotyping of Single-Nucleotide Polymorphisms in the Human Genome," Science, vol. 280, pp. 1077-1082, 1998.
[5] M. Cargill, D. Altshuler, J. Ireland, P. Sklar, K. Ardlie, N. Patil, N. Shaw, C.R. Lane, E.P. Lim, N. Kalyanaraman, J. Nemesh, L. Ziaugra, L. Friedland, A. Rolfe, J. Warrington, R. Lipshutz, G.Q. Daley, and E.S. Lander, "Characterization of Single-Nucleotide Polymorphisms in Coding Regions of Human Genes," Nature Genetics, vol. 22, pp. 231-238, 1999.
[6] M.R. Hoehe, K. Kopke, B. Wendel, K. Rohde, C. Flachmeier, K.K. Kidd, W.H. Berrettini, and G.M. Church, "Sequence Variability and Candidate Gene Analysis in Complex Disease: Association of $\mu$ Opioid Receptor Gene Variation with Substance Dependence," Human Molecular Genetics, vol. 9, pp. 2895-2908, 2000.
[7] B.V. Halldorsson, V. Bafna, N. Edwards, R. Lippert, S. Yooseph, and S. Istrail, "A Survey of Computational Methods for Determining Haplotypes," Proc. DIMACS/RECOMB Satellite Workshop Computational Methods for SNPs and Haplotype Inference, pp. 26-47, 2002.
[8] D. Gusfield and S.H. Orzack, "Haplotype Inference," Handbook of Computational Molecular Biology, S. Aluru, ed., pp. 18-1-18-28, CRC Press, 2005.
[9] S.R. Browning and B.L. Browning, "Haplotype Phasing: Existing Methods and New Developments," Nature Rev. Genetics, vol. 12, pp. 703-714, 2011.
[10] S.C. Schuster, "Next-Generation Sequencing Transforms Todays Biology," Nature Methods, vol. 5, pp. 16-18, 2008.
[11] J. Shendure and H. Ji, "Next-Generation DNA Sequencing," Nature Biotechnology, vol. 26, pp. 1135-1145, 2008.
[12] S. Clark, "Inference of Haplotypes from PCR-Amplified Samples of Diploid Populations," Molecular Biology and Evolution, vol. 7, pp. 111-122, 1990.
[13] L. Wang and Y. Xu, "Haplotype Inference by Maximum Parsimony," Bioinformatics, vol. 19, pp. 1773-1780, 2003.
[14] L. Excoffier and M. Slatkin, "Maximum-Likelihood Estimation of Molecular Haplotype Frequencies in a Diploid Population," Molecular Biology Evolution, vol. 12, pp. 921-927, 1995.
[15] D. Gusfield, "Haplotyping as Perfect Phylogeny: Conceptual Framework and Efficient Solutions," Proc. Sixth Ann. Int'l Conf. Computational Biology (RECOMB), pp. 166-175, 2002.
[16] M. Stephens, N.J. Smith, and P. Donnelly, "A New Statistical Method for Haplotype Reconstruction from Population Data," Am. J. Human Genetics, vol. 68, pp. 978-989, 2001.
[17] A.L. Williams, N. Patterson, J. Glessner, H. Hakonarson, and D. Reich, "Phasing of Many Thousands of Genotyped Samples," Am. J. Human Genetics, vol. 91, no. 2, pp. 238-251, 2012.
[18] O. Delaneau, J.-F. Zagury, and J. Marchini, "Improved Whole-Chromosome Phasing for Disease and Population Genetic Studies," Nature Methods, vol. 10, pp. 5-6, 2013.
[19] S. Climer, A.R. Templeton, and W. Zhang, "SplittingHeirs: Inferring Haplotypes by Optimizing Resultant Dense Graphs," Proc. First ACM Int'l Conf. Bioinformatics and Computational Biology (BCB), pp. 127-136, 2010.
[20] D. Brinza and A. Zelikovsky, "2SNP: Scalable Phasing Based on 2-SNP Haplotypes," Bioinformatics, vol. 22, no. 3, pp. 371-373, 2006.
[21] S.R. Browning and B.L. Browning, "Rapid and Accurate Haplotype Phasing and Missing-Data Inference for Whole-Genome Association Studies by Use of Localized Haplotype Clustering," Am. J. Human Genetics, vol. 81, pp. 1084-1097, 2007.
[22] E. Hubbell, "Finding a Maximum Parsimony Solution to Haplotype Phase Is NP-Hard," personal communication, 2000.
[23] D. Catanzaro and M. Labbé, "The Pure Parsimony Haplotyping Problem: Overview and Computational Advances," INFORMS J. Computing, vol. 16, no. 5, pp. 561-584, 2009.
[24] G. Lancia, M.C. Pinotti, and R. Rizzi, "Haplotyping Populations by Pure Parsimony: Complexity of Exact and Approximation Algorithms," INFORMS J. Computing, vol. 16, pp. 348-359, 2004.
[25] R. Sharan, B.V. Halldorsson, and S. Istrail, "Islands of Tractability for Parsimony Haplotyping," IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 3, no. 3, pp. 303-311, July-Sept. 2006.
[26] Q. Zhang, H. Che, G. Chen, and G. Sun, "A Practical Algorithm for Haplotyping by Maximum Parsimony," J. Software, vol. 16, no. 10, pp. 1699-1707, 2005.
[27] L. Tininini, P. Bertolazzi, A. Godi, and G. Lancia, "CollHaps: A Heuristic Approach to Haplotype Inference by Parsimony," IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 7, no. 3, pp. 511-523, July-Sept. 2010.
[28] Z.S. Qin, T. Niu, and J.S. Liu, "Partition-Ligation EM Algorithm for Haplotype Inference with Single Nucleotide Polymorphisms," Am. J. Human Genetics, vol. 71, pp. 1242-1247, 2002.
[29] Y. Zhao, Y. Xu, Z. Wang, H. Zhang, and G. Chen, "A Better Block Partition and Ligation Strategy for Individual Haplotyping," Bioinformatics, vol. 24, no. 23, pp. 2720-2725, 2008.
[30] P. Bonizzoni, G.D. Vedova, R. Dondi, and J. Li, "The Haplotyping Problem: An Overview of Computational Models and Solutions," J. Computer Science and Technology, vol. 18, no. 6, pp. 675-688, 2003.
[31] V. Bafna, D. Gusfield, S. Hannenhalli, and S. Yooseph, "A Note on Efficient Computation of Haplotypes via Perfect Phylogeny," J. Computational Biology, vol. 11, no. 5, pp. 858-866, 2004.
[32] M. Stephens and P. Scheet, "Accounting for Decay of Linkage Disequilibrium in Haplotype Inference and Missing-Data Imputation," Am. J. Human Genetics, vol. 76, pp. 449-462, 2005.
[33] P. Scheet and M. Stephens, "A Fast and Flexible Statistical Model for Large-Scale Population Genotype Data: Applications to Inferring Missing Genotypes and Haplotypic Phase," Am. J. Human Genetics, vol. 78, pp. 629-644, 2006.
[34] M.J. Daly, J.D. Rioux, S.F. Schaffner, T.J. Hudson, and E.S. Lander, "High-Resolution Haplotype Structure in the Human Genome," Nature Genetics, vol. 29, pp. 229-232, 2001.
[35] N. Patil, A.J. Berno, D.A. Hinds, W.A. Barrett, J.M. Doshi, C.R. Hacker, C.R. Kautzer, D.H. Lee, C. Marjoribanks, D.P. McDonough, B.T.N. Nguyen, M.C. Norris, J.B. Sheehan, N. Shen, D. Stern, R.P. Stokowski, D.J. Thomas, M.O. Trulson, K.R. Vyas, K.A. Frazer, S.P.A. Fodor, and D.R. Cox, "Blocks of Limited Haplotype Diversity Revealed by High-Resolution Scanning of Human Chromosome 21," Science, vol. 294, pp. 1719-1723, 2001.
[36] R.C. Lewontin, "On Measures of Gametic Disequilibrium," Genetics, vol. 120, pp. 849-852, 1988.
[37] J.C. Barrett, B. Fry, J. Maller, and M.J. Daly, "Haploview: Analysis and Visualization of LD and Haplotype Maps," Bioinformatics, vol. 21, pp. 263-265, 2005.
[38] J. Marchini, B. Howie, S. Myers, G. McVean, and P. Donnelly, "A New Multipoint Method for Genome-Wide Association Studies by Imputation of Genotypes," Nature Genetics, vol. 39, pp. 906-913, 2007.
[39] S. Lin, D.J. Cutler, M.E. Zwick, and A. Chakravarti, "Haplotype Inference in Random Population Samples," Am. J. Human Genetics, vol. 71, pp. 1129-137, 2002.
[40] M. Stephens and P. Donnelly, "A Comparison of Bayesian Methods for Haplotype Reconstruction from Population Genotype Data," Am. J. Human Genetics, vol. 73, pp. 1162-169, 2003.
[41] M.J. Rieder, S.L. Taylor, A.G. Clark, and D.A. Nickerson, "Sequence Variation in the Human Angiotensin Converting Enzyme," Nature Genetics, vol. 22, pp. 59-62, 1999.
[42] K.A. Frazer et al., "A Second Generation Human Haplotype Map of over 3.1 Million SNPs," Nature, vol. 449, pp. 851-861, 2007.
[43] R.R. Hudson, "Generating Samples under a Wright-Fisher Neutral Model of Genetic Variation," Bioinformatics, vol. 18, pp. 337-338, 2002.
65 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool