This Article 
 Bibliographic References 
 Add to: 
Markov Invariants for Phylogenetic Rate Matrices Derived from Embedded Submodels
May-June 2012 (vol. 9 no. 3)
pp. 828-836
P. Jarvis, Sch. of Math. & Phys., Univ. of Tasmania, Hobart Tas, TAS, Australia
J. Sumner, Sch. of Math. & Phys., Univ. of Tasmania, Hobart Tas, TAS, Australia
We consider novel phylogenetic models with rate matrices that arise via the embedding of a progenitor model on a small number of character states, into a target model on a larger number of character states. Adapting representation-theoretic results from recent investigations of Markov invariants for the general rate matrix model, we give a prescription for identifying and counting Markov invariants for such "symmetric embedded” models, and we provide enumerations of these for the first few cases with a small number of character states. The simplest example is a target model on three states, constructed from a general 2 state model; the "2 \hookrightarrow 3” embedding. We show that for 2 taxa, there exist two invariants of quadratic degree that can be used to directly infer pairwise distances from observed sequences under this model. A simple simulation study verifies their theoretical expected values, and suggests that, given the appropriateness of the model class, they have superior statistical properties than the standard (log) Det invariant (which is of cubic degree for this case).

[1] H.J. Bandelt and A.W.M. Dress, "Split Decomposition: A New and Useful Approach to Phylogenetic Analysis of Distance Data," Molecular Phylogenetics and Evolution, vol. 1, pp. 242-252, 1992.
[2] D. Barry and J.A. Hartigan, "Asynchronous Distance between Homologous DNA Sequences," Biometrics, vol. 43, pp. 261-276, 1987.
[3] J.A. Cavender and J. Felsenstein, "Invariants of Phylogenies in a Simple Case with Discrete States," J. Classification, vol. 4, pp. 57-71, 1987.
[4] B. Fauser, P.D. Jarvis, R.C. King, and B.G. Wybourne, "New Branching Rules Induced by Plethysm," J. Physics A: Math. General, vol. 39, pp. 2611-2655, 2006.
[5] I. Gronau, S. Moran, and I. Yavneh, "Towards Optimal Distance Functions for Stoch Astic Substitution Models," J. Theoretical Biology, vol. 260, pp. 294-307, 2009.
[6] B. Holland and V. Moulton, "Consensus Networks: A Method for Visualising Incompatibilities in Collections of Trees," Proc. Third Workshop Algorithms in Bioinformatics, pp. 165-176, 2004.
[7] J.P. Huelsenbeck, B. Larget, and M.E. Alfaro, "Bayesian Phylogenetic Model Selection Using Reversible Jump Markov Chain Monte Carlo," Molecular Biology Evolution, vol. 21, pp. 1123-1133, 2004.
[8] A. Isaev, Introduction to Mathematical Methods in Bioinformatics. Springer, 2004.
[9] J.E. Johnson, "Markov-Type Lie Groups in $GL(n,{{\hbox{\rlap{I}\kern 2.0pt{\hbox{R}}}}})$ ," J. Math. Physics, vol. 26, pp. 252-257, 1985.
[10] J.A. Lake, "A Rate-Independent Technique for Analysis of Nucleic Acid Sequences: Evolutionary Parsimony," Molecular Biology Evolution, vol. 4, pp. 167-191, 1987.
[11] D.E. Littlewood, "The Kronecker Product of Symmetric Group Representations," J. London Math. Soc., vol. s1-31, no. 1, pp. 89-93, 1955.
[12] I.G. MacDonald, Symmetric Functions and Hall Polynomials. Clarendon Press, 1979.
[13] B. Mourad, "On a Lie-Theoretic Approach to Generalised Doubly Stochastic Matrices and Applications," Linear and Multilinear Algebra, vol. 52, pp. 99-113, 2004.
[14] M. Pagel and A. Meade, "A Phylogenetic Mixture Model for Detecting Pattern-Heterogeneity in Gene Sequence or Character-State Data," Systematic Biology, vol. 53, pp. 571-581, 2004.
[15] D. Posada and K.A. Crandall, "Modeltest: Testing the Model of DNA Substitution," Bioinformatics, vol. 14, pp. 817-818, 1998.
[16] C. Semple and M. Steel, Phylogenetics. Oxford Press, 2003.
[17] J.G. Sumner, J. Fernández-Sánchez, and P.D. Jarvis, "Lie Markov Models," J. Theoretical Biology, vol. 298, pp. 16-31, 2012.
[18] J.G. Sumner, M.A. Charleston, L.S. Jermiin, and P.D. Jarvis, "Markov Invariants, Plethysms, and Phylogenetics," J. Theoretical Biology,, vol. 253, pp. 601-615, 2008.
[19] J.G. Sumner, B.H. Holland, and P.D. Jarvis, "The Algebra of the General Markov Model on Trees and Networks," to appear in Bull. Math. Biology, DOI: 10.1007/s11538-011-9691-z, (2011b).
[20] J.G. Sumner and P.D. Jarvis, "Entanglement Invariants and Phylogenetic Branching," J. Math. Biology, vol. 51, pp. 18-36, 2005.
[21] J.G. Sumner and P.D. Jarvis, "Using the Tangle: A Consistent Construction of Phylogenetic Distance Matrices," Math. Biosciences, vol. 204, pp. 49-67, 2006.
[22] J.G. Sumner and P.D. Jarvis, "Markov Invariants and the Isotropy Subgroup of a Quartet Tree," J. Theoretical Biology, vol. 258, pp. 302-310, 2009.
[23] M. Woodhams, J.G. Sumner, and M.A. Charleston, "Mosiac Models for Phylogenetic Estimation," in preparation, 2009.
[24] B.G. Wybourne, "Schur: An Interactive Programme for Calculating Properties of Lie Groups," Version 6.03, http://sourceforge. net/projectsschur, 2004.

Index Terms:
statistical analysis,embedded systems,evolution (biological),genetics,Markov processes,M-theory,physiological models,standard det invariant,Markov invariants,phylogenetic rate matrices,progenitor model,general rate matrix model,symmetric embedded models,statistical properties,Markov processes,Phylogeny,Adaptation models,Polynomials,Tensile stress,Algebra,Biological system modeling,representation theory.,Markov chains
P. Jarvis, J. Sumner, "Markov Invariants for Phylogenetic Rate Matrices Derived from Embedded Submodels," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 9, no. 3, pp. 828-836, May-June 2012, doi:10.1109/TCBB.2012.24
Usage of this product signifies your acceptance of the Terms of Use.