The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.06 - Nov.-Dec. (2012 vol.9)
pp: 1582-1594
Y. Pirola , Dipt. di Inf. Sist. e Comun. (DISCo), Univ. degli Studi di Milano-Bicocca, Milan, Italy
G. D. Vedova , Dipt. di Statistica, Univ. degli Studi di Milano-Bicocca, Milan, Italy
S. Biffani , Centro Ric. e Studi Agroalimentari (CeRSA), Lodi, Italy
A. Stella , Centro Ric. e Studi Agroalimentari (CeRSA), Lodi, Italy
P. Bonizzoni , Dipt. di Inf. Sist. e Comun. (DISCo), Univ. degli Studi di Milano-Bicocca, Milan, Italy
ABSTRACT
The MINIMUM-RECOMBINANT HAPLOTYPE CONFIGURATION problem (MRHC) has been highly successful in providing a sound combinatorial formulation for the important problem of genotype phasing on pedigrees. Despite several algorithmic advances that have improved the efficiency, its applicability to real data sets has been limited since it does not take into account some important phenomena such as mutations, genotyping errors, and missing data. In this work, we propose the MINIMUM-RECOMBINANT HAPLOTYPE CONFIGURATION WITH BOUNDED ERRORS problem (MRHCE), which extends the original MRHC formulation by incorporating the two most common characteristics of real data: errors and missing genotypes (including untyped individuals). We describe a practical algorithm for MRHCE that is based on a reduction to the well-known Satisfiability problem (SAT) and exploits recent advances in the constraint programming literature. An experimental analysis demonstrates the biological soundness of the phasing model and the effectiveness (on both accuracy and performance) of the algorithm under several scenarios. The analysis on real data and the comparison with state-of-the-art programs reveals that our approach couples better scalability to large and complex pedigrees with the explicit inclusion of genotyping errors into the model.
INDEX TERMS
Bioinformatics, Genetics, Algorithm design and analysis, Computational biology, Genomics,recombinations, Haplotype inference, pedigrees, genotyping errors, missing data
CITATION
Y. Pirola, G. D. Vedova, S. Biffani, A. Stella, P. Bonizzoni, "A fast and practical approach to genotype phasing and imputation on a pedigree with erroneous and incomplete information", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.9, no. 6, pp. 1582-1594, Nov.-Dec. 2012, doi:10.1109/TCBB.2012.100
REFERENCES
[1] H.A. Lewin, “It's a Bull's Market,” Science, vol. 324, no. 5926, pp. 478-479, 2009.
[2] The Int'l HapMap Consortium, “A Second Generation Human Haplotype Map of over 3.1 Million SNPs,” Nature, vol. 449, no. 7164, pp. 851-861, Oct. 2007.
[3] P. Bonizzoni, G. Della Vedova, R. Dondi, and J. Li, “The Haplotyping Problem: An Overview of Computational Models and Solutions,” J. Computer Science and Technology, vol. 18, no. 6, pp. 675-688, 2003.
[4] R. Tewhey, V. Bansal, A. Torkamani, E.J. Topol, and N.J. Schork, “The Importance of Phase Information for Human Genomics,” Nature Rev. Genetics, vol. 12, no. 3, pp. 215-223, Mar. 2011.
[5] E. Lander and P. Green, “Construction of Multilocus Genetic Linkage Maps in Human,” Proc. Nat'l Academy of Sciences USA, vol. 84, pp. 2363-2367, 1987.
[6] R.C. Elson and J. Stewart, “A General Model for the Analysis of Pedigree Data,” Human Heredity, vol. 21, pp. 523-542, 1971.
[7] D. Qian and L. Beckmann, “Minimum-Recombinant Haplotyping in Pedigrees,” Am. J. Human Genetics, vol. 70, no. 6, pp. 1434-1445, 2002.
[8] J. Li and T. Jiang, “Efficient Inference of Haplotypes from Genotypes on a Pedigree,” J. Bioinformatics and Computational Biology, vol. 1, no. 1, pp. 41-69, Apr. 2003.
[9] W.-B. Wang and T. Jiang, “Inferring Haplotypes from Genotypes on a Pedigree with Mutations, Genotyping Errors and Missing Alleles,” J. Bioinformatics and Computational Biology, vol. 9, no. 2, pp. 339-365, 2011.
[10] Y. Pirola, P. Bonizzoni, and T. Jiang, “An Efficient Algorithm for Haplotype Inference on Pedigrees with Recombinations and Mutations,” IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 9, no. 1, pp. 12-25, Jan. 2012.
[11] F. Pompanon, A. Bonin, E. Bellemain, and P. Taberlet, “Genotyping Errors: Causes, Consequences and Solutions,” Nature Rev. Genetics, vol. 6, no. 11, pp. 847-859, 2005.
[12] L. Liu, X. Chen, J. Xiao, and T. Jiang, “Complexity and Approximation of the Minimum Recombinant Haplotype Configuration Problem,” Theoretical Computer Science, vol. 378, no. 3, pp. 316-330, June 2007.
[13] L. Wang, Z. Wang, and W. Yang, “Linked Region Detection Using High-Density SNP Genotype Data via the Minimum Recombinant Model of Pedigree Haplotype Inference,” BMC Bioinformatics, vol. 10, no. 1, article 216, 2009.
[14] G. Lin, Z. Wang, L. Wang, Y.-L. Lau, and W. Yang, “Identification of Linked Regions Using High-Density SNP Genotype Data in Linkage Analysis,” Bioinformatics, vol. 24, no. 1, pp. 86-93, 2008.
[15] M. Sargolzaei and F.S. Schenkel, “QMSim: A Large-Scale Genome Simulator for Livestock,” Bioinformatics, vol. 25, no. 5, pp. 680-681, 2009.
[16] M. Björk, “Successful SAT Encoding Techniques,” JSAT Addendum, July 2009.
[17] R. Asín, R. Nieuwenhuis, A. Oliveras, and E. Rodríguez-Carbonell, “Cardinality Networks: A Theoretical and Empirical Study,” Constraints, vol. 16, no. 2, pp. 195-221, 2011.
[18] J. Li and T. Jiang, “Computing the Minimum Recombinant Haplotype Configuration from Incomplete Genotype Data on a Pedigree by Integer Linear Programming,” J. Computational Biology, vol. 12, no. 6, pp. 719-739, 2005.
[19] X. Li and J. Li, “An Almost Linear Time Algorithm for a General Haplotype Solution on Tree Pedigrees with no Recombination and Its Extensions,” J. Bioinformatics and Computational Biology, vol. 7, no. 3, pp. 521-545, June 2009.
73 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool