This Article 
 Bibliographic References 
 Add to: 
Integer Programming Approaches to Haplotype Inference by Pure Parsimony
April-June 2006 (vol. 3 no. 2)
pp. 141-154
In 2003, Gusfield introduced the Haplotype Inference by Pure Parsimony (HIPP) problem and presented an integer program (IP) that quickly solved many simulated instances of the problem [1]. Although it solved well on small instances, Gusfield's IP can be of exponential size in the worst case. Several authors [2], [3] have presented polynomial-sized IPs for the problem. In this paper, we further the work on IP approaches to HIPP. We extend the existing polynomial-sized IPs by introducing several classes of valid cuts for the IP. We also present a new polynomial-sized IP formulation that is a hybrid between two existing IP formulations and inherits many of the strengths of both. Many problems that are too complex for the exponential-sized formulations can still be solved in our new formulation in a reasonable amount of time. We provide a detailed empirical comparison of these IP formulations on both simulated and real genotype sequences. Our formulation can also be extended in a variety of ways to allow errors in the input or model the structure of the population under consideration.

[1] D. Gusfield, “Haplotype Inference by Pure Parsimony,” Proc. 14th Ann. Symp. Combinatorial Pattern Matching, pp. 144-155, 2003.
[2] B.V. Halldórsson, V. Bafna, N. Edwards, R. Lippert, S. Yooseph, and S. Istrail, “A Survey of Computational Methods for Determining Haplotypes,” Computational Methods for SNPs and Haplotype Inference: Proc. DIMACS/RECOMB Satellite Workshop, pp. 26-47, 2004.
[3] G. Lancia, C.M. Pinotti, and R. Rizzi, “Haplotyping Populations by Pure Parsimony: Complexity of Exact and Approximation Algorithms,” INFORMS J. Computing, vol. 16, no. 4, pp. 348-359, 2004.
[4] A.G. Clark, K.M. Weiss, D.A. Nickerson, S.L. Taylor, A. Buchanan, J. Stengard, V. Salomaa, E. Vartiainen, M. Perola, E. Boerwinkle, and C.F. Sing, “Haplotype Structure and Population Genetic Inferences from Nucleotide-Sequence Variation in Human Lipoprotein Lipase,” Am. J. Human Genetics, vol. 63, no. 2, pp. 595-612, 1998.
[5] D. Gusfield, “Inference of Haplotypes from Samples of Diploid Populations: Complexity and Algorithms,” J. Computational Biology, vol. 8, no. 3, pp. 305-313. 2001.
[6] N. Edwards, Personal communication to D. Brown, Aug. 2004.
[7] K. Kalpakis and P. Namjoshi, “Haplotype Phasing Using Semidefinite Programming,” Proc. Fifth IEEE Int'l Symp. Bioinformatics and Bioeng., pp. 145-152, 2005.
[8] J. He and A. Zelikovsky, “Linear Reduction for Haplotype Inference,” Proc. Fourth Ann. Workshop Algorithms in Bioinformatics, pp. 242-253, 2004.
[9] M.J. Daly, J.D. Rioux, S.F. Schaffner, T.J. Hudson, and E.S. Lander, “High-Resolution Haplotype Structure in the Human Genome,” Nature Genetics, vol. 29, no. 2, pp. 229-232, 2001.
[10] N. Patil et al., “Blocks of Limited Haplotype Diversity Revealed by High-Resolution Scanning of Human Chromosome 21,” Science, vol. 294, no. 5547, pp. 1719-1723, 2001.
[11] A.G. Clark, “Inference of Haplotypes from PCR-Amplified Samples of Diploid Populations,” Molecular Biology and Evolution, vol. 7, no. 2, pp. 111-112, 1990.
[12] K. Lindblad-Toh et al., “Genome Sequence, Comparative Analysis and Haplotype Structure of the Domestic Dog,” Nature, vol. 438, no. 7069, pp. 803-809, 2005.
[13] D.A. Hinds, L.L. Stuve, G.B. Nilsen, E. Halperin, E. Eskin, D.G. Ballinger, K.A. Frazer, and D.R. Cox, “Whole-Genome Patterns of Common DNA Variation in Three Human Populations,” Science, vol. 307, no. 5712, pp. 1072-1079, 2005.
[14] The Int'l HapMap Consortium, “A Haplotype Map of the Human Genome,” Nature, vol. 437, no. 7063, pp. 1299-1300, 2005.
[15] L. Wang and Y. Xu, “Haplotype Inference by Maximum Parsimony,” Bioinformatics, vol. 19, no. 14, pp. 1773-1780, 2003.
[16] D.G. Brown and I.M. Harrower, “A New Integer Programming Formulation for the Pure Parsimony Problem in Haplotype Analysis,” Proc. Fourth Ann. Workshop Algorithms in Bioinformatics, pp. 254-265, 2004.
[17] Z. Li, W. Zhou, X.S. Zhang, and L. Chen, “A Parsimonious Tree-Grow Method for Haplotype Inference,” Bioinformatics, vol. 21, no. 17, pp. 3475-3481, 2005.
[18] Y.-T. Huang, K.-M. Chao, and T. Chen, “An Approximation Algorithm for Haplotype Inference by Maximum Parsimony,” J. Computational Biology, vol. 12, no. 10, pp. 1261-1274, 2005.
[19] J.E. Mitchell, Branch-and-Cut Algorithms for Combinatorial Optimization Problems, pp. 65-77. Oxford Univ. Press, Jan. 2002.
[20] A. Charnes, “Optimality and Degeneracy in Linear Programming,” Econometrica, vol. 20, pp. 160-170, 1952.
[21] C. Bron and J. Kerbosch, “Algorithm 457: Finding All Cliques of an Undirected Graph,” Comm. ACM, vol. 16, no. 9, pp. 575-577, 1973.
[22] ILOG CPLEX 8. 1: Reference Manual, ILOG SA, Dec. 2002.
[23] R.R. Hudson, “Generating Samples under a Wright-Fisher Neutral Model of Genetic Variation,” Bioinformatics, vol. 18, no. 2, pp. 337-338, 2002.
[24] The Int'l HapMap Consortium, “Integrating Ethics and Science in the International HapMap Project,” Nature Rev. Genetics, vol. 5, no. 6, pp. 467-475, 2004.
[25] T. Barzuza, J. Beckmann, R. Shamir, and I. Pe'er, “Computational Problems in Perfect Phylogeny Haplotyping: Xor-Genotypes and Tag SNPs,” Proc. 14th Ann. Symp. Combinatorial Pattern Matching, pp. 14-31, 2004.
[26] V. Bafna, D. Gusfield, S. Hannenhalli, and S. Yooseph, “A Note on Efficient Computation of Haplotypes via Perfect Phylogeny,” J. Computational Biology, vol. 11, no. 5, pp. 858-866, 2004.
[27] V. Bafna, D. Gusfield, G. Lancia, and S. Yooseph, “Haplotyping as Perfect Phylogeny: A Direct Approach,” J. Computational Biology, vol. 10, pp. 323-340, 2003.

Index Terms:
Computations on discrete structures, integer programming, biology and genetics, haplotype inference.
Daniel G. Brown, Ian M. Harrower, "Integer Programming Approaches to Haplotype Inference by Pure Parsimony," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 3, no. 2, pp. 141-154, April-June 2006, doi:10.1109/TCBB.2006.24
Usage of this product signifies your acceptance of the Terms of Use.