CSDL Home IEEE/ACM Transactions on Computational Biology and Bioinformatics 2008 vol.5 Issue No.02 - April-June

Subscribe

Issue No.02 - April-June (2008 vol.5)

pp: 252-261

ABSTRACT

A Single Nucleotide Polymorphism (SNP) is a positionin the genome at which two or more of the possible fournucleotides occur in a large percentage of the population. SNPsaccount for most of the genetic variability between individuals,and mapping SNPs in the human population has become thenext high-priority in genomics after the completion of the HumanGenome project. In diploid organisms such as humans, thereare two non-identical copies of each autosomal chromosome. Adescription of the SNPs in a chromosome is called a haplotype.At present, it is prohibitively expensive to directly determine thehaplotypes of an individual, but it is possible to obtain rather easilythe conflated SNP information in the so called genotype. Computationalmethods for genotype phasing, i.e., inferring haplotypesfrom genotype data, have received much attention in recent yearsas haplotype information leads to increased statistical power ofdisease association tests. However, many of the existing algorithmshave impractical running time for phasing large genotype datasetssuch as those generated by the international HapMap project.In this paper we propose a highly scalable algorithm based onentropy minimization. Our algorithm is capable of phasing bothunrelated and related genotypes coming from complex pedigrees.Experimental results on both real and simulated datasets showthat our algorithm achieves a phasing accuracy worse but closeto that of best existing methods while being several orders ofmagnitude faster. The open source code implementation of thealgorithm and a web interface are publicly available at http://dna.engr.uconn.edu/~software/ent/.

INDEX TERMS

Single Nucleotide Polymorphism, haplotype, genotype phasing, algorithm.

CITATION

Alexander Gusev, Ion I. Măndoiu, Bogdan Paşaniuc, "Highly Scalable Genotype Phasing by Entropy Minimization",

*IEEE/ACM Transactions on Computational Biology and Bioinformatics*, vol.5, no. 2, pp. 252-261, April-June 2008, doi:10.1109/TCBB.2007.70223REFERENCES

- [1] B. Paşaniuc and I. Măndoiu, “Highly Scalable Genotype Phasing by Entropy Minimization,”
Proc. 28th Ann. IEEE Int'l Conf. Eng. in Medicine and Biology Soc., pp. 3482-3486, 2006.- [2] D. Gusfield, “An Overview of Combinatorial Methods for Haplotype Inference,”
Proc. DIMACS/RECOMB Satellite Workshop Computational Methods for SNPs and Haplotype Inference, pp. 9-25, 2004.- [3] B. Halldorsson, V. Bafna, N. Edwards, R. Lippert, S. Yooseph, and S. Istrail, “A Survey of Computational Methods for Determining Haplotypes,”
Proc. DIMACS/RECOMB Satellite Workshop Computational Methods for SNPs and Haplotype Inference, pp. 26-47, 2004.- [4] T. Niu, “Algorithms for Inferring Haplotypes,”
Genetic Epidemiology, vol. 27, pp. 334-347, 2004.- [5] R. Salem, J. Wessel, and N. Schork, “A Comprehensive Literature Review of Haplotyping Software and Methods for Use with Unrelated Individuals,”
Human Genomics, vol. 2, pp. 39-66, 2005.- [7] E. Halperin and R. Karp, “The Minimum-Entropy Set Cover Problem,”
Proc. Ann. Int'l Colloquium Automata, Languages, and Programming, 2004.- [8] http:/www.hapmap.org/, 2008.
- [9] H. Ackerman, S. Usen, R. Mott, A. Richardson, F. Sisay-Joof, P. Katundu, T. Taylor, R. Ward, M. Molyneux, M. Pinder, and D.P. Kwiatkowski, “Haplotypic Analysis of the TNF Locus by Association Efficiency and Entropy,”
Genome Biology, vol. 4, pp.24.1-24.13, 2003.- [10] M. Daly, J. Rioux, S. Schaffner, T. Hudson, and E. Lander, “High-Resolution Haplotype Structure in the Human Genome,”
Nature Genetics, vol. 29, no. 2, pp. 229-232, 2001.- [11] I. Măndoiu and B. Paşaniuc, “Haplotype Inference by Entropy Minimization,”
Proc. Ninth Ann. Int'l Conf. Research in Computational Molecular Biology, pp. 221-222, 2005.- [13] E. Eskin, E. Halperin, and R. Sharan, “Optimally Phasing Long Genomic Regions Using Local Haplotype Predictions,”
Proc. Second RECOMB Satellite Workshop Computational Methods for SNPs and Haplotypes, pp. 13-16, 2004.- [14] D. Gusfield, “Haplotyping by Pure Parsimony,”
Proc. 14th Ann. Symp. Combinatorial Pattern Matching, pp. 144-155, 2003.- [16] D. Brown and I. Harrower, “A New Integer Programming Formulation for the Pure Parsimony Problem in Haplotype Analysis,”
Proc. Fourth Int'l Workshop Algorithms in Bioinformatics, pp. 254-265, 2004.- [18] J. Xiao, L. Liu, L. Xia, and T. Jiang, “Fast Elimination of Redundant Linear Equations and Reconstruction of Recombination-Free Mendelian Inheritance on a Pedigree,”
Proc. 18th Ann. ACM-SIAM Symp. Discrete Algorithms, 2007.- [19] S.H. Orzack, D. Gusfield, J. Olson, S. Nesbitt, L. Subrahmanyan, and V.P. Stanton Jr., “Analysis and Exploration of the Use of Rule-Based Algorithms and Consensus Methods for the Inferral of Haplotypes,”
Genetics, vol. 165, no. 2, pp. 915-928, 2003.- [22] D. Branza, J. He, W. Mao, and A. Zelikovsky, “Phasing and Missing Data Recovery in Family Trios,”
Lecture Notes in Computer Science, vol. 3515, pp. 1011-1019, 2005.- [25] P. Scheet and M. Stephens, “A Fast and Flexible Statistical Model for Large-Scale Population Genotype Data: Applications to Inferring Missing Genotypes and Haplotypic Phase,”
Am. J.Human Genetics, 2006. |