CSDL Home IEEE/ACM Transactions on Computational Biology and Bioinformatics 2011 vol.8 Issue No.03 - May/June

Subscribe

Issue No.03 - May/June (2011 vol.8)

pp: 710-722

Elizabeth S. Allman , University of Alaska, Fairbanks

Sonja Petrović , University of Illinois at Chicago, Chicago

John A. Rhodes , University of Alaska, Fairbanks

Seth Sullivant , North Carolina State University, Raleigh

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TCBB.2010.79

ABSTRACT

Phylogenetic data arising on two possibly different tree topologies might be mixed through several biological mechanisms, including incomplete lineage sorting or horizontal gene transfer in the case of different topologies, or simply different substitution processes on characters in the case of the same topology. Recent work on a 2-state symmetric model of character change showed that for 4 taxa, such a mixture model has nonidentifiable parameters, and thus, it is theoretically impossible to determine the two tree topologies from any amount of data under such circumstances. Here, the question of identifiability is investigated for two-tree mixtures of the 4-state group-based models, which are more relevant to DNA sequence data. Using algebraic techniques, we show that the tree parameters are identifiable for the JC and K2P models. We also prove that generic substitution parameters for the JC mixture models are identifiable, and for the K2P and K3P models obtain generic identifiability results for mixtures on the same tree. This indicates that the full phylogenetic signal remains in such mixtures, and the 2-state symmetric result is thus a misleading guide to the behavior of other models.

INDEX TERMS

Phylogenetic mixture, group-based model, identifiability of phylogenetic models.

CITATION

Elizabeth S. Allman, Sonja Petrović, John A. Rhodes, Seth Sullivant, "Identifiability of Two-Tree Mixtures for Group-Based Models",

*IEEE/ACM Transactions on Computational Biology and Bioinformatics*, vol.8, no. 3, pp. 710-722, May/June 2011, doi:10.1109/TCBB.2010.79REFERENCES

- [1] E.S. Allman, C. Ané, and J.A. Rhodes, "Identifiability of a Markovian Model of Molecular Evolution with Gamma-Distributed Rates,"
Advances in Applied Probability, vol. 40, pp. 229-249, arXiv:0709.0531, 2008.- [2] E.S. Allman, C. Matias, and J.A. Rhodes, "Identifiability of Parameters in Latent Structure Models with Many Observed Variables,"
Annals of Statistics, vol. 37, no. 6A, pp. 3099-3132, 2009.- [3] E.S. Allman, S. Petrović, J.A. Rhodes, and S. Sullivant, "Supplementary Material," http://www4.ncsu.edu~smsulli2/Pubs/TwoTreesWebsite twotrees.html, 2009.
- [4] E.S. Allman and J.A. Rhodes, "Phylogenetic Invariants for the General Markov Model of Sequence Mutation,"
Math. Bioscience, vol. 186, no. 2, pp. 113-144, 2003.- [5] E.S. Allman and J.A. Rhodes, "The Identifiability of Tree Topology for Phylogenetic Models, Including Covarion and Mixture Models,"
J. Computational Biology, vol. 13, no. 5, pp. 1101-1113, 2006.- [6] E.S. Allman and J.A. Rhodes, "Phylogenetic Ideals and Varieties for the General Markov Model,"
Advances in Applied Math., vol. 40, no. 2, pp. 127-148, 2008.- [7] E.S. Allman and J.A. Rhodes, "The Identifiability of Covarion Models in Phylogenetics,"
IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 6, no. 1, pp. 76-88, Jan.-Mar. 2009.- [8] J.A. Cavender and J. Felsenstein, "Invariants of Phylogenies in a Simple Case with Discrete States,"
J. Classification, vol. 4, pp. 57-71, 1987.- [9] J.T. Chang, "Full Reconstruction of Markov Models on Evolutionary Trees: Identifiability and Consistency,"
Math. Bioscience, vol. 137, no. 1, pp. 51-73, 1996.- [10] D. Cox, J. Little, and D. O'Shea,
Ideals, Varieties, and Algorithms: An Introduction to Computational Algebraic Geometry and Commutative Algebra, second ed. Springer-Verlag, 1997.- [11] J. Draisma, "A Tropical Approach to Secant Dimensions,"
J. Pure and Applied Algebra, vol. 212, no. 2, pp. 349-363, 2008.- [12] M. Drton, B. Sturmfels, and S. Sullivant, "Lectures on Algebraic Statistics,"
Oberwolfach Seminars, vol. 39, Birkhäuser Basel, 2008.- [13] S.N. Evans and T.P. Speed, "Invariants of Some Probability Models Used in Phylogenetic Inference,"
Annals of Statistics, vol. 21, no. 1, pp. 355-377, 1993.- [14] W. Fulton, "The William H. Roever Lectures in Geometry,"
Introduction to Toric Varieties, Princeton Univ. Press, 1993.- [15] D.R. Grayson and M.E. Stillman, "Macaulay2, a Software System for Research in Algebraic Geometry," http://www.math.uiuc. eduMacaulay2/, 2002.
- [16] G.-M. Greuel, G. Pfister, and H. Schönemann, "Singular 3.1.0 A Computer Algebra System for Polynomial Computations," http:/www.singular.uni-kl.de, 2009.
- [17] J. Harris,
Algebraic Geometry: A First Course. Springer-Verlag, 1992.- [18] M.D. Hendy, "The Relationship between Simple Evolutionary Tree Models and Observable Sequence Data,"
Systematic Zoology, vol. 38, pp. 310-321, 1989.- [19] M.D. Hendy and D. Penny, "Spectral Analysis of Phylogenetic Data,"
J. Classification, vol. 10, pp. 1-20, 1993.- [20] M.D. Hendy and D. Penny, "Complete Families of Linear Invariants for Some Stochastic Models of Sequence Evolution, with and without the Molecular Clock Assumption,"
J. Computational Biology, vol. 3, no. 1, pp. 19-31, 1996.- [21] S. Hoşten, A. Khetan, and B. Sturmfels, "Solving The Likelihood Equations,"
Foundations of Computational Math., vol. 5, pp. 389-407, arXiv:math.ST/0408270, 2005.- [22] J.A. Lake, "A Rate Independent Technique for Analysis of Nucleic Acid Sequences: Evolutionary Parsimony,"
Molecular Biology and Evolution, vol. 4, no. 2, pp. 167-191, 1987.- [23] F.A. Matsen, E. Mossel, and M. Steel, "Mixed-Up Trees: The Structure of Phylogenetic Mixtures,"
Bull. of Math. Biology, vol. 70, no. 4, pp. 1115-1139, 2008.- [24] F.A. Matsen and M.A. Steel, "Phylogenetic Mixtures on a Single Tree Can Mimic a Tree of Another Topology,"
Systematic Biology, vol. 56, no. 5, pp. 767-775, 2007.- [25] E. Mossel and E. Vigoda, "Phylogenetic MCMC Algorithms Are Misleading on Mixtures of Trees,"
Science, vol. 309, pp. 2207-2209, 2005.- [26] S. Rudich, "Complexity Theory: From Gödel to Feynman,"
Computational Complexity Theory, vol. 10, pp. 5-87, Am. Math. Soc., 2004.- [27] C. Semple and M. Steel,
Phylogenetics, vol. 24. Oxford Univ. Press, 2003.- [28] D. Speyer and B. Sturmfels, "The Tropical Grassmannian,"
Advances in Geometry, vol. 4, no. 3, pp. 389-411, 2004.- [29] M.A. Steel and Y.X. Fu, "Classifying and Counting Linear Phylogenetic Invariants for the Jukes-Cantor Model,"
J. Computational Biology, vol. 2, no. 1, pp. 39-47, 1995.- [30] B. Sturmfels,
Gröbner Bases and Convex Polytopes, vol. 8. Am. Math. Soc., 1996.- [31] B. Sturmfels and S. Sullivant, "Toric Ideals of Phylogenetic Invariants,"
J. Computational Biology, vol. 12, no. 2, pp. 204-228, 2005.- [32] L. Székely, P.L. Erdös, M.A. Steel, and D. Penny, "A Fourier Inversion Formula for Evolutionary Trees,"
Applied Math. Letters, vol. 6, no. 2, pp. 13-17, 1993.- [33] L.A. Székely, M.A. Steel, and P.L. Erdös, "Fourier Calculus on Evolutionary Trees,"
Advances in Applied Math., vol. 14, no. 2, pp. 200-210, 1993.- [34] D. Štefankovič and E. Vigoda, "Phylogeny of Mixture Models: Robustness of Maximum Likelihood and Non-Identifiable Distributions,"
J. Computational Biology, vol. 14, no. 2, pp. 156-189, 2007.- [35] J. Chai and E.A. Housworth, "On Rogers's Proof of Identifiability for the GTR + Gamma + I Model," to appear in
Systematic Biology. |