The Community for Technology Leaders
RSS Icon
Issue No.03 - July-September (2010 vol.7)
pp: 511-523
Paola Bertolazzi , CNR-Istituto di Analisi dei Sistemi e Informatica, Roma
Alessandra Godi , CNR-Istituto di Analisi dei Sistemi e Informatica, Roma
Giuseppe Lancia , University of Udine, Udine
Haplotype data play a relevant role in several genetic studies, e.g., mapping of complex disease genes, drug design, and evolutionary studies on populations. However, the experimental determination of haplotypes is expensive and time-consuming. This motivates the increasing interest in techniques for inferring haplotype data from genotypes, which can instead be obtained quickly and economically. Several such techniques are based on the maximum parsimony principle, which has been justified by both experimental results and theoretical arguments. However, the problem of haplotype inference by parsimony was shown to be NP-hard, thus limiting the applicability of exact parsimony-based techniques to relatively small data sets. In this paper, we introduce collapse rule, a generalization of the well-known Clark's rule, and describe a new heuristic algorithm for haplotype inference (implemented in a program called CollHaps), based on parsimony and the iterative application of collapse rules. The performance of CollHaps is tested on several data sets. The experiments show that CollHaps enables the user to process large data sets obtaining very “parsimonious” solutions in short processing times. They also show a correlation, especially for large data sets, between parsimony and correct reconstruction, supporting the validity of the parsimony principle to produce accurate solutions.
Biology and genetics, heuristic methods, discrete mathematics.
Paola Bertolazzi, Alessandra Godi, Giuseppe Lancia, "CollHaps: A Heuristic Approach to Haplotype Inference by Parsimony", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.7, no. 3, pp. 511-523, July-September 2010, doi:10.1109/TCBB.2008.130
[1] J.C. Venter et al., "The Sequence of the Human Genome," Science, vol. 291, pp. 1304-1351, 2001.
[2] L. Jin, P.A. Underhill, V. Doctor, R.W. Davis, P. Shen, L.L. Cavalli-Sforza, and P.J. Oefner, "Distribution of Haplotypes from a Chromosome 21 Region Distinguishes Multiple Prehistoric Human Migiration," Proc. Nat'l Academy of Sciences USA, vol. 96, pp. 3796-3800, 1999.
[3] M.R. Hoehe, K. Köpke, B. Wendel, K. Rohde, C. Flachmeier, K.K. Kidd, W.H. Berrettini, and C.G. M, "Sequence Variability and Candidate Gene Analysis in Complex Disease: Association of $\mu$ Opioid Receptor Gene Variation with Substance Dependence," Human Molecular Genetics, vol. 9, pp. 2895-2908, 2000.
[4] D. Gusfield and S.H. Orzack, "Haplotype Inference," Handbook of Computational Molecular Biology, S. Aluru, ed., pp. 18-1-18-28, CRC Press, 2005.
[5] B.V. Halldorsson, V. Bafna, N. Edwards, R. Lippert, S. Yooseph, and S. Istrail, "A Survey of Computational Methods for Determining Haplotypes," Proc. DIMACS/RECOMB Satellite Workshop Computational Methods for SNPs and Haplotype Inference, pp. 26-47, 2002.
[6] G. Lancia, M.C. Pinotti, and R. Rizzi, "Haplotyping Populations by Pure Parsimony: Complexity of Exact and Approximation Algorithms," INFORMS J. Computing, vol. 16, pp. 348-359, 2004.
[7] S. Clark, "Inference of Haplotypes from Pcr-Amplified Samples of Diploid Populations," Molecular Biology and Evolution, vol. 7, pp. 111-122, 1990.
[8] E. Hubbell, "Finding a Maximum Parsimony Solution to Haplotype Phase Is NP-Hard," personal comm., 2000.
[9] R. Sharan, B.V. Halldorsson, and S. Istrail, "Islands of Tractability for Parsimony Haplotyping," IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 3, no. 3, pp. 303-311, July-Sept. 2006.
[10] L. Wang and Y. Xu, "Haplotype Inference by Maximum Parsimony," Bioinformatics, vol. 19, pp. 1773-1780, 2003.
[11] D. Gusfield, "Haplotype Inference by Pure Parsimony," Proc. 14th Ann. Symp. Combinatorial Pattern Matching (CPM '03), pp. 144-155, 2003.
[12] D.G. Brown and I.M. Harrower, "Integer Programming Approaches to Haplotype Inference by Pure Parsimony," IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 3, no. 2, pp. 141-154, Apr.-June 2006.
[13] I. Lynce and J. Marques-Silva, "Sat in Bioinformatics: Making the Case with Haplotype Inference," Proc. Int'l Conf. Theory and Applications of Satisfiability Testing (SAT), pp. 283-296, 2006.
[14] I. Lynce and J. Marques-Silva, "Haplotype Inference with Boolean Satisfiability," Int'l J. Artificial Intelligence Tools, vol. 17, pp. 355-387, 2008.
[15] L. Excoffier and M. Slatkin, "Maximum-Likelihood Estimation of Molecular Haplotype Frequencies in a Diploid Population," Molecular Biology and Evolution, vol. 12, pp. 921-927, 1995.
[16] M.E. Hawley and K.K. Kidd, "Haplo: A Program Using the EM Algorithm to Estimate the Frequencies of Multi-Site Haplotypes," J. Heredity, vol. 86, pp. 409-411, 1995.
[17] J.C. Long, R.C. Williams, and M. Urbanek, "An E-M Algorithm and Testing Strategy for Multiple-Locus Haplotypes," Am. J. Human Genetics, vol. 56, pp. 799-810, 1995.
[18] G. Kimmel and R. Shamir, "Gerbil: Genotype Resolution and Block Identification Using Likelihood," Proc. Nat'l Academy of Sciences USA, vol. 102, pp. 158-162, 2005.
[19] M. Stephens and P. Donnelly, "A Comparison of Bayesian Methods for Haplotype Reconstruction from Population Genotype Data," Am. J. Human Genetics, vol. 73, pp. 1162-1169, 2003.
[20] R.R. Hudson, "Gene Genealogies and the Coalescent Process," Oxford Surveys in Evolutionary Biology, D.J. Futuyma and J. Antonovics, eds., vol. 7, pp. 1-44, Oxford Univ. Press, 2006.
[21] P. Scheet and M. Stephens, "A Fast and Flexible Statistical Model for Large-Scale Population Genotype Data: Applications to Inferring Missing Genotypes and Haplotypic Phase," Am. J. Human Genetics, vol. 78, pp. 629-644, 2006.
[22] T. Niu, Z.S. Qin, X. Xu, and J.S. Liu, "Bayesian Haplotype Inference for Multiple Linked Single-Nucleotide Polymorphisms," Am. J. Human Genetics, vol. 70, pp. 157-169, 2002.
[23] G. Lancia and P. Serafini, "A Covering Approach with Column-Generation for Parsimony Haplotyping," Technical Report 4-07, Dip. di Matematica e Informatica, Univ. of Udine, 2007.
[24] R.R. Hudson, "Generating Samples under a Wright-Fisher Neutral Model of Genetic Variation," Bioinformatics, vol. 18, pp. 337-338, 2002.
[25] M. Stephens, N. Smith, and P. Donnelly, "A New Statistical Method for Haplotype Reconstruction from Population Data," Am. J. Human Genetics, vol. 68, pp. 978-989, 2001.
[26] S. Lin, D.J. Cutler, E. Zwick, and A. Chakravarti, "Haplotype Inference in Random Population Samples," Am. J. Human Genetics, vol. 71, pp. 1129-1137, 2002.
[27] M.J. Rieder, S.L. Taylor, A.G. Clark, and D.A. Nickerson, "Sequence Variation in the Human Angiotensin Converting Enzyme," Nature Genetics, vol. 22, pp. 59-62, 1999.
[28] B. Kerem, J.M. Rommens, J.A. Buchanan, D. Marliewicz, T.K. Cox, A. Chakravarti, M. Buchwald, and L. Tsui, "Identification of the Cystic Fibrosis Gene: Genetic Analysis," Science, vol. 245, pp. 1073-1080, 1989.
[29] D. Fallin and N.J. Schork, "Accuracy of Haplotype Frequency Estimation for Biallelic Loci, via the Expectation-Maximization Algorithm for Unphased Diploid Genotype Data," Am. J. Human Genetics, vol. 67, pp. 947-959, 2000.
[30] M.J. Daly, J.D. Rioux, S.F. Schaffner, T.J. Hudson, and E.S. Lander, "High-Resolution Haplotype Structure in the Human Genome," Nature Genetics, vol. 29, pp. 229-232, 2001.
28 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool