The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.01 - January/February (2012 vol.9)
pp: 12-25
Y. Pirola , Dipt. di Inf. Sist. e Comun., Univ. degli Studi di Milano-Bicocca, Milan, Italy
P. Bonizzoni , Dipt. di Inf. Sist. e Comun., Univ. degli Studi di Milano-Bicocca, Milan, Italy
Tao Jiang , Dept. of Comput. Sci. & Eng., Univ. of California, Riverside, CA, USA
ABSTRACT
Haplotype Inference (HI) is a computational challenge of crucial importance in a range of genetic studies. Pedigrees allow to infer haplotypes from genotypes more accurately than population data, since Mendelian inheritance restricts the set of possible solutions. In this work, we define a new HI problem on pedigrees, called Minimum-Change Haplotype Configuration (MCHC) problem, that allows two types of genetic variation events: recombinations and mutations. Our new formulation extends the Minimum-Recombinant Haplotype Configuration (MRHC) problem, that has been proposed in the literature to overcome the limitations of classic statistical haplotyping methods. Our contribution is twofold. First, we prove that the MCHC problem is APX-hard under several restrictions. Second, we propose an efficient and accurate heuristic algorithm for MCHC based on an L-reduction to a well-known coding problem. Our heuristic can also be used to solve the original MRHC problem and can take advantage of additional knowledge about the input genotypes. Moreover, the L-reduction proves for the first time that MCHC and MRHC are O(nm/log nm)-approximable on general pedigrees, where n is the pedigree size and m is the genotype length. Finally, we present an extensive experimental evaluation and comparison of our heuristic algorithm with several other state-of-the-art methods for HI on pedigrees.
INDEX TERMS
Genetics, Bioinformatics, Computational biology, Inference algorithms, Polynomials, Heuristic algorithms, Bones,mutations., Algorithms, haplotyping, haplotype inference, pedigree, recombinations
CITATION
Y. Pirola, P. Bonizzoni, Tao Jiang, "An Efficient Algorithm for Haplotype Inference on Pedigrees with Recombinations and Mutations", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.9, no. 1, pp. 12-25, January/February 2012, doi:10.1109/TCBB.2011.51
REFERENCES
[1] K.A. Frazer et al., “A Second Generation Human Haplotype Map of over 3.1 Million SNPs,” Nature, vol. 449, no. 7164, pp. 851-861, Oct. 2007.
[2] P. Bonizzoni, G. Della Vedova, R. Dondi, and J. Li, “The Haplotyping Problem: An Overview of Computational Models and Solutions,” J. Computer Science and Technology, vol. 18, no. 6, pp. 675-688, 2003.
[3] D.-A. Trégouët et al., “Genome-Wide Haplotype Association Study Identifies the SLC22A3-LPAL2-LPA Gene Cluster as a Risk Locus for Coronary Artery Disease,” Nature Genetics, vol. 41, no. 3, pp. 283-285, Mar. 2009.
[4] E. Lander and P. Green, “Construction of Multilocus Genetic Linkage Maps in Human,” Proc. Nat'l Academy of Sciences USA, vol. 84, pp. 2363-2367, 1987.
[5] R.C. Elson and J. Stewart, “A General Model for the Analysis of Pedigree Data,” Human Heredity, vol. 21, pp. 523-542, 1971.
[6] D. Qian and L. Beckmann, “Minimum-Recombinant Haplotyping in Pedigrees,” Am. J. Human Genetics, vol. 70, no. 6, pp. 1434-1445, 2002.
[7] J. Li and T. Jiang, “Efficient Inference of Haplotypes from Genotypes on a Pedigree,” J. Bioinformatics and Computational Biology, vol. 1, no. 1, pp. 41-69, Apr. 2003.
[8] G. Ausiello, P. Crescenzi, V. Gambosi, G. Kann, A. Marchetti-Spaccamela, and M. Protasi, Complexity and Approximation: Combinatorial Optimization Problems and Their Approximability Properties, Springer, 1999.
[9] S. Arora, L. Babai, J. Stern, and Z. Sweedyk, “The Hardness of Approximate Optima in Lattices, Codes, and Systems of Linear Equations,” J. Computer and System Sciences, vol. 54, no. 2, pp. 317-331, 1997.
[10] R.G. Gallager, Low-Density Parity-Check Codes, MIT Press, 1963.
[11] W.-B. Wang and T. Jiang, “Efficient Inference of Haplotypes from Genotypes on a Pedigree with Mutations and Missing Alleles (Extended Abstract),” Proc. 20th Symp. Combinatorial Pattern Matching (CPM), G. Kucherov and E. Ukkonen, eds., pp. 353-367, 2009.
[12] L. Liu, X. Chen, J. Xiao, and T. Jiang, “Complexity and Approximation of the Minimum Recombinant Haplotype Configuration Problem,” Theoretical Computer Science, vol. 378, no. 3, pp. 316-330, June 2007.
[13] M. Garey and D. Johnson, Computer and Intractability: A Guide to the Theory of NP-completeness, W.H. Freeman, 1979.
[14] M. Yannakakis, “Node-and Edge-Deletion NP-Complete Problems,” Proc. 10th Symp. Theory of Computing (STOC), pp. 253-264, 1978.
[15] R. Cilibrasi, L. van Iersel, S. Kelk, and J. Tromp, “The Complexity of the Single Individual SNP Haplotyping Problem,” Algorithmica, vol. 49, no. 1, pp. 13-36, 2007.
[16] J. Xiao, L. Liu, L. Xia, and T. Jiang, “Efficient Algorithms for Reconstructing Zero-Recombinant Haplotypes on a Pedigree Based on Fast Elimination of Redundant Linear Equations,” SIAM J. Computing, vol. 38, no. 6, pp. 2198-2219, 2009.
[17] N. Garg, V.V. Vazirani, and M. Yannakakis, “Approximate Max-Flow Min-(Multi)Cut Theorems and Their Applications,” SIAM J. Computing, vol. 25, no. 2, pp. 235-251, 1996.
[18] C. Meyer, Matrix Analysis and Applied Linear Algebra. SIAM, 2000.
[19] N. Alon, R. Panigrahy, and S. Yekhanin, “Deterministic Approximation Algorithms for the Nearest Codeword Problem,” APPROX '09/RANDOM '09: Proc. 12th Int'l Workshop and 13th Int'l Workshop Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, I. Dinur, K. Jansen, J. Naor, and J.D.P. Rolim, eds., pp. 339-351, 2009.
[20] J. Pearl, “Reverend Bayes on Inference Engines: A Distributed Hierarchical Approach,” Proc. Am. Assoc. Artificial Intelligence Nat'l Conf. AI, pp. 133-136, 1982.
[21] J. Li and T. Jiang, “Computing the Minimum Recombinant Haplotype Configuration from Incomplete Genotype Data on a Pedigree by Integer Linear Programming,” J. Computational Biology, vol. 12, no. 6, pp. 719-739, 2005.
[22] E. Sobel and K. Lange, “Descent Graphs in Pedigree Analysis: Applications to Haplotyping, Location Scores, and Marker-Sharing Statistics,” Am. J. Human Genetics, vol. 58, no. 6, pp. 1323-1337, June 1996.
[23] S.M. Leal, K. Yan, and B. Müller-Myhsok, “SimPed: A Simulation Program to Generate Haplotype and Genotype Data for Pedigree Structures,” Human Heredity, vol. 60, no. 2, pp. 119-122, Jan. 2005.
[24] S.B. Gabriel et al., “The Structure of Haplotype Blocks in the Human Genome,” Science, vol. 296, no. 5576, pp. 2225-2229, 2002.
226 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool