The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.02 - April-June (2008 vol.5)
pp: 252-261
ABSTRACT
A Single Nucleotide Polymorphism (SNP) is a positionin the genome at which two or more of the possible fournucleotides occur in a large percentage of the population. SNPsaccount for most of the genetic variability between individuals,and mapping SNPs in the human population has become thenext high-priority in genomics after the completion of the HumanGenome project. In diploid organisms such as humans, thereare two non-identical copies of each autosomal chromosome. Adescription of the SNPs in a chromosome is called a haplotype.At present, it is prohibitively expensive to directly determine thehaplotypes of an individual, but it is possible to obtain rather easilythe conflated SNP information in the so called genotype. Computationalmethods for genotype phasing, i.e., inferring haplotypesfrom genotype data, have received much attention in recent yearsas haplotype information leads to increased statistical power ofdisease association tests. However, many of the existing algorithmshave impractical running time for phasing large genotype datasetssuch as those generated by the international HapMap project.In this paper we propose a highly scalable algorithm based onentropy minimization. Our algorithm is capable of phasing bothunrelated and related genotypes coming from complex pedigrees.Experimental results on both real and simulated datasets showthat our algorithm achieves a phasing accuracy worse but closeto that of best existing methods while being several orders ofmagnitude faster. The open source code implementation of thealgorithm and a web interface are publicly available at http://dna.engr.uconn.edu/~software/ent/.
INDEX TERMS
Single Nucleotide Polymorphism, haplotype, genotype phasing, algorithm.
CITATION
Alexander Gusev, Ion I. Măndoiu, Bogdan Paşaniuc, "Highly Scalable Genotype Phasing by Entropy Minimization", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.5, no. 2, pp. 252-261, April-June 2008, doi:10.1109/TCBB.2007.70223
REFERENCES
[1] B. Paşaniuc and I. Măndoiu, “Highly Scalable Genotype Phasing by Entropy Minimization,” Proc. 28th Ann. IEEE Int'l Conf. Eng. in Medicine and Biology Soc., pp. 3482-3486, 2006.
[2] D. Gusfield, “An Overview of Combinatorial Methods for Haplotype Inference,” Proc. DIMACS/RECOMB Satellite Workshop Computational Methods for SNPs and Haplotype Inference, pp. 9-25, 2004.
[3] B. Halldorsson, V. Bafna, N. Edwards, R. Lippert, S. Yooseph, and S. Istrail, “A Survey of Computational Methods for Determining Haplotypes,” Proc. DIMACS/RECOMB Satellite Workshop Computational Methods for SNPs and Haplotype Inference, pp. 26-47, 2004.
[4] T. Niu, “Algorithms for Inferring Haplotypes,” Genetic Epidemiology, vol. 27, pp. 334-347, 2004.
[5] R. Salem, J. Wessel, and N. Schork, “A Comprehensive Literature Review of Haplotyping Software and Methods for Use with Unrelated Individuals,” Human Genomics, vol. 2, pp. 39-66, 2005.
[6] J. Marchini, D. Cutler, N. Patterson, M. Stephens, E. Eskin, E. Halperin, S. Lin, Z. Qin, H. Munro, G. Abecasis, and P. Donnelly, “A Comparison of Phasing Algorithms for Trios and Unrelated Individuals,” Am. J. Human Genetics, vol. 78, pp. 437-450, 2006.
[7] E. Halperin and R. Karp, “The Minimum-Entropy Set Cover Problem,” Proc. Ann. Int'l Colloquium Automata, Languages, and Programming, 2004.
[8] http:/www.hapmap.org/, 2008.
[9] H. Ackerman, S. Usen, R. Mott, A. Richardson, F. Sisay-Joof, P. Katundu, T. Taylor, R. Ward, M. Molyneux, M. Pinder, and D.P. Kwiatkowski, “Haplotypic Analysis of the TNF Locus by Association Efficiency and Entropy,” Genome Biology, vol. 4, pp.24.1-24.13, 2003.
[10] M. Daly, J. Rioux, S. Schaffner, T. Hudson, and E. Lander, “High-Resolution Haplotype Structure in the Human Genome,” Nature Genetics, vol. 29, no. 2, pp. 229-232, 2001.
[11] I. Măndoiu and B. Paşaniuc, “Haplotype Inference by Entropy Minimization,” Proc. Ninth Ann. Int'l Conf. Research in Computational Molecular Biology, pp. 221-222, 2005.
[12] Z. Qin, T. Niu, and J. Liu, “Partition-Ligation: Expectation-Maximization Algorithm for Haplotype Inference with Single-Nucleotide Polymorphisms,” Am. J. Human Genetics, vol. 71, pp.1242-1247, 2002.
[13] E. Eskin, E. Halperin, and R. Sharan, “Optimally Phasing Long Genomic Regions Using Local Haplotype Predictions,” Proc. Second RECOMB Satellite Workshop Computational Methods for SNPs and Haplotypes, pp. 13-16, 2004.
[14] D. Gusfield, “Haplotyping by Pure Parsimony,” Proc. 14th Ann. Symp. Combinatorial Pattern Matching, pp. 144-155, 2003.
[15] L. Wang and Y. Xu, “Haplotype Inference by Maximum Parsimony,” Bioinformatics, vol. 19, pp. 1773-1780, 2003.
[16] D. Brown and I. Harrower, “A New Integer Programming Formulation for the Pure Parsimony Problem in Haplotype Analysis,” Proc. Fourth Int'l Workshop Algorithms in Bioinformatics, pp. 254-265, 2004.
[17] G. Lancia, M. Pinotti, and R. Rizzi, “Haplotyping Populations by Pure Parsimony: Complexity of Exact and Approximation Algorithms,” INFORMS J. Computing, vol. 16, pp. 348-359, 2004.
[18] J. Xiao, L. Liu, L. Xia, and T. Jiang, “Fast Elimination of Redundant Linear Equations and Reconstruction of Recombination-Free Mendelian Inheritance on a Pedigree,” Proc. 18th Ann. ACM-SIAM Symp. Discrete Algorithms, 2007.
[19] S.H. Orzack, D. Gusfield, J. Olson, S. Nesbitt, L. Subrahmanyan, and V.P. Stanton Jr., “Analysis and Exploration of the Use of Rule-Based Algorithms and Consensus Methods for the Inferral of Haplotypes,” Genetics, vol. 165, no. 2, pp. 915-928, 2003.
[20] M. Stephens, N.J. Smith, and P. Donnelly, “A New Statistical Method for Haplotype Reconstruction from Population Data,” Am. J. Human Genetics, vol. 68, pp. 978-989, 2001.
[21] D. Branza and A. Zelikovsky, “2snp: Scalable Phasing Based on 2-SNP Haplotypes,” Bioinformatics, vol. 22, no. 3, pp. 371-373, 2006.
[22] D. Branza, J. He, W. Mao, and A. Zelikovsky, “Phasing and Missing Data Recovery in Family Trios,” Lecture Notes in Computer Science, vol. 3515, pp. 1011-1019, 2005.
[23] M. Stephens and P. Donnelly, “A Comparison of Bayesian Methods for Haplotype Reconstruction from Population Genotype Data,” Am. J. Human Genetics, vol. 73, pp. 1162-1169, 2003.
[24] M. Stephens and P. Scheet, “Accounting for Decay of Linkage Disequilibrium in Haplotype Inference and Missing-Data Imputation,” Am. J. Human Genetics, vol. 76, pp. 449-462, 2005.
[25] P. Scheet and M. Stephens, “A Fast and Flexible Statistical Model for Large-Scale Population Genotype Data: Applications to Inferring Missing Genotypes and Haplotypic Phase,” Am. J.Human Genetics, 2006.
[26] E. Halperin and E. Eskin, “Haplotype Reconstruction from Genotype Data Using Imperfect Phylogeny,” Bioinformatics, vol. 20, pp. 1842-1849, 2004.
[27] S. Lin, A. Chakravarti, and D. Cutler, “Haplotype and Missing Data Inference in Nuclear Families,” Genome Research, vol. 14, no. 8, pp. 1624-1632, 2004.
5 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool