CSDL Home IEEE/ACM Transactions on Computational Biology and Bioinformatics 2010 vol.7 Issue No.04 - October-December

Subscribe

Issue No.04 - October-December (2010 vol.7)

pp: 598-610

Paola Bonizzoni , Università Degli Studi di Milano-Bicocca, Milano

Gianluca Della Vedova , Università Degli Studi di Milano-Bicocca, Milano

Riccardo Dondi , Università degli Studi di Bergamo, Bergamo

Yuri Pirola , Università Degli Studi di Milano-Bicocca, Milano

Romeo Rizzi , Università degli Studi di Udine, Udine

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TCBB.2010.52

ABSTRACT

The haplotype resolution from xor-genotype data has been recently formulated as a new model for genetic studies [1]. The xor-genotype data is a cheaply obtainable type of data distinguishing heterozygous from homozygous sites without identifying the homozygous alleles. In this paper, we propose a formulation based on a well-known model used in haplotype inference: pure parsimony. We exhibit exact solutions of the problem by providing polynomial time algorithms for some restricted cases and a fixed-parameter algorithm for the general case. These results are based on some interesting combinatorial properties of a graph representation of the solutions. Furthermore, we show that the problem has a polynomial time k-approximation, where k is the maximum number of xor-genotypes containing a given single nucleotide polymorphisms (SNP). Finally, we propose a heuristic and produce an experimental analysis showing that it scales to real-world large instances taken from the HapMap project.

INDEX TERMS

Algorithms, haplotype resolution, pure parsimony, approximation algorithms, graph representation.

CITATION

Paola Bonizzoni, Gianluca Della Vedova, Riccardo Dondi, Yuri Pirola, Romeo Rizzi, "Pure Parsimony Xor Haplotyping",

*IEEE/ACM Transactions on Computational Biology and Bioinformatics*, vol.7, no. 4, pp. 598-610, October-December 2010, doi:10.1109/TCBB.2010.52REFERENCES

- [1] T. Barzuza, J.S. Beckmann, R. Shamir, and I. Pe'er, "Computational Problems in Perfect Phylogeny Haplotyping: Xor-Genotypes and Tag SNPs,"
Proc. 15th Ann. Symp. Combinatorial Pattern Matching (CPM '04), pp. 14-31, http://springerlink.metapress. comopenurl.asp?genre=article&issn=0302-9743&volume= 3109&spage=14 , July 2004.- [2] W. Xiao and P.J. Oefner, "Denaturing High-Performance Liquid Chromatography: A Review,"
Human Mutation, vol. 17, no. 6, pp. 439-474, 2001.- [3] T. Barzuza, J.S. Beckmann, R. Shamir, and I. Pe'er, "Computational Problems in Perfect Phylogeny Haplotyping: Typing without Calling the Allele,"
IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 5, no. 1, pp. 101-109, Jan.-Mar. 2008.- [4] D. Gusfield, "Haplotyping as Perfect Phylogeny: Conceptual Framework and Efficient Solutions,"
Proc. Sixth Ann. Conf. Research in Computational Molecular Biology (RECOMB), pp. 166-175, 2002,- [5] N. Patil, A.J. Berno, D.A. Hinds, W.A. Barrett, J.M. Doshi, C.R. Hacker, C.R. Kautzer, D.H. Lee, C. Marjoribanks, D.P. McDonough, B.T. Nguyen, M.C. Norris, J.B. Sheehan, N. Shen, D. Stern, R.P. Stokowski, D.J. Thomas, M.O. Trulson, K.R. Vyas, K.A. Frazer, S.P. Fodor, and D.R. Cox, "Blocks of Limited Haplotype Diversity Revealed by High-Resolution Scanning of Human Chromosome 21,"
Science, vol. 294, no. 5547, pp. 1719-1723, http://dx.doi.org/10.1126science.1065573 , Nov. 2001.- [6] R. Sharan, B.V. Halldórsson, and S. Istrail, "Islands of Tractability for Parsimony Haplotyping,"
IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 3, no. 3, pp. 303-311, July-Sept. 2006.- [7] D. Gusfield, "Haplotype Inference by Pure Parsimony,"
Proc. 14th Symp. Combinatorial Pattern Matching (CPM), pp. 144-155, 2003,- [8] D.G. Brown and I.M. Harrower, "Integer Programming Approaches to Haplotype Inference by Pure Parsimony,"
IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 3, no. 2, pp. 141-154, Apr. 2006.- [9] G. Lancia, M.C. Pinotti, and R. Rizzi, "Haplotyping Populations by Pure Parsimony: Complexity of Exact and Approximation Algorithms,"
INFORMS J. Computing, vol. 16, no. 4, pp. 348-359, 2004.- [10] L. van Iersel, J. Keijsper, S. Kelk, and L. Stougie, "Shorelines of Islands of Tractability: Algorithms for Parsimony and Minimum Perfect Phylogeny Haplotyping Problems,"
IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 5, no. 2, pp. 301-312, Apr.-June 2008.- [11] G. Lancia and R. Rizzi, "A Polynomial Case of the Parsimony Haplotyping Problem,"
Operations Research Letters, vol. 34, no. 3, pp. 289-295, 2006.- [12] R. Diestel,
Graph Theory, third ed. vol. 173, Springer-Verlag, 2005.- [13] R. Downey and M. Fellows,
Parameterized Complexity. Springer-Verlag, 1999.- [14] C. Savage, "A Survey of Combinatorial Gray Codes,"
SIAM Rev., vol. 39, no. 4, pp. 605-629, http://dx.doi.org/10.1137S0036144595295272 , 1997.- [15] J.R. Bitner, G. Ehrlich, and E.M. Reingold, "Efficient Generation of the Binary Reflected Gray Code and Its Applications,"
Comm. ACM, vol. 19, no. 9, pp. 517-521, http://doi.acm.org/10.1145360336.360343, 1976.- [16] E. Fredkin, "Trie Memory,"
Comm. ACM, vol. 3, no. 9, pp. 490-499, http://doi.acm.org/10.1145367390.367400, 1960.- [17] W.T. Tutte, "An Algorithm for Determining whether a Given Binary Matroid Is Graphic,"
Proc. Am. Math. Soc., vol. 11, no. 6, pp. 905-917, 1960.- [18] R.E. Bixby and D.K. Wagner, "An Almost Linear-Time Algorithm for Graph Realization,"
Math. of Operations Research, vol. 13, pp. 99-123, 1988.- [19] S. Fujishige, "An Efficient PQ-Graph Algorithm for Solving the Graph Realization Problem,"
J. Computer and System Science, vol. 21, pp. 63-68, 1980.- [20] T. Barzuza, GREAL—Software for the Graph Realization Problem, http://acgt.cs.tau.ac.ilgreal/, 2010.
- [21] F. Gavril and R. Tamari, "An Algorithm for Constructing Edge-Trees from Hypergraphs,"
Networks, vol. 13, no. 3, pp. 377-388, http://dx.doi.org/10.1002net.3230130306, 1983.- [22] R.R. Hudson, "Generating Samples under a Wright-Fisher Neutral Model of Genetic Variation,"
Bioinformatics, vol. 18, no. 2, pp. 337-338, http://dx.doi.org/10.1093/bioinformatics 18.2.337, Feb. 2002.- [23] GLPK—the GNU Linear Programming Kit, http://www.gnu. org/softwareglpk/, 2010.
- [24] The International HapMap Consortium, "A Haplotype Map of the Human Genome,"
Nature, vol. 437, no. 7063,pp. 1299-1320, http://dx.doi.org/10.1038nature04226, 2005. |