This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Unidentifiable Divergence Times in Rates-across-Sites Models
July-September 2004 (vol. 1 no. 3)
pp. 130-134
The rates-across-sites assumption in phylogenetic inference posits that the rate matrix governing the Markovian evolution of a character on an edge of the putative phylogenetic tree is the product of a character-specific scale factor and a rate matrix that is particular to that edge. Thus, evolution follows basically the same process for all characters, except that it occurs faster for some characters than others. To allow estimation of tree topologies and edge lengths for such models, it is commonly assumed that the scale factors are not arbitrary unknown constants, but rather unobserved, independent, identically distributed draws from a member of some parametric family of distributions. A popular choice is the gamma family. We consider an example of a clock-like tree with three taxa, one unknown edge length, a known root state, and a parametric family of scale factor distributions that contains the gamma family. This model has the property that, for a generic choice of unknown edge length and scale factor distribution, there is another edge length and scale factor distribution which generates data with exactly the same distribution, so that even with infinitely many data it will be typically impossible to make correct inferences about the unknown edge length.

[1] J. Chang, “Full Reconstruction of Markov Models on Evolutionary Trees: Identifiability and Consistency,” Math. Biosciences, vol. 137, pp. 51-73, 1996.
[2] J. Felsenstein, “Cases in Which Parsimony and Compatibility Methods Will be Positively Misleading,” Systematic Zoology, vol. 27, pp. 401-410, 1978.
[3] J. Felsenstein, Inferring Phylogenies. Mass.: Sinauer Associates, 2004.
[4] S. Guindon and O. Gascuel, “A Simple, Fast, and Accurate Algorithm to Estimate Large Phylogenies by Maximum Likelihood,” Systematic Biology, vol. 52, no. 5, pp. 696-704, 2003.
[5] G.R. Grimmett and D.R. Stirzaker, Probability and Random Processes, third ed. New York: Oxford Univ. Press, 2001.
[6] M. Hasegawa, H. Kishino, and T. Yano, “Man's Place in Homonoidea as Inferred from Molecular Clocks of DNA,” J. Molecular Evolution, vol. 2, pp. 132-147, 1987.
[7] J.P. Huelsenbeck and R. Ronquist, “MrBayes: Bayesian Inference of Phylogeny,” Bioinformatics, vol. 17, pp. 754-755, 2001.
[8] L. Jin and M. Nei, “Limitations of Evolutionary Parsimony Methods of Phylogenetic Analysis,” Molecular Biology and Evolution, vol. 7, pp. 82-102, 1990.
[9] S.G. Krantz and H.R. Parks, The Implicit Function Theorem: History, Theory, and Applications. Mass.: Birkhäuser Boston Inc., 2002.
[10] P. Lewis, “A Genetic Algorithm For Maximum Likelihood Phylogeny Inference Using Nucleotide Sequence Data,” Molecular Biology and Evolution, vol. 15, pp. 277-283, 1998.
[11] M. Nei, R. Chakraborty, and P.A. Fuerst, “Infinite Allele Model with Varying Mutation Rate,” Proc. Nat'l Academy of Sciences USA, vol. 73, pp. 4164-4168, 1976.
[12] G.J. Olsen, “Earliest Phylogenetic Branchings: Comparing rRNA-Based Evolutionary Trees Inferred with Various Techniques,” Proc. Cold Spring Harbor Symp. Quantitative Biology, vol. 52, pp. 825-837, 1987.
[13] G. Olsen, H. Matsuda, R. Hagstrom, and R. Overbeek, “Fast-DNAml: A Tool for Construction of Phylogenetic Trees of DNA Sequences Using Maximum Likelihood,” Computations in Applied Biosciences, vol. 10, no. 1, pp. 41-48, 1994.
[14] S.L.K. Pond and S. Muse, “Hyphy Package Distribution and Documentation Page,” http:/www.hyphy.org, 2000.
[15] J.H. Reeves, “Heterogeneity in the Substitution Process of Amino Acid Sites of Proteins Coded for by Mitochondrial DNA,” J. Molecular Evolution, pp. 17-31, 1992.
[16] J.S. Rogers, “Maximum Likelihood Estimation of Phylogenetic Trees is Consistent When Substitution Rates Vary According to the Invariable Sites Plus Gamma Distribution,” Systematic Biology, vol. 50 pp. 713-722, 2001.
[17] D.L. Swofford, G.J. Olsen, P.J. Waddell, and D.M. Hillis, “Phylogenetic Inference,” Molecular Systematics, D.M. Hillis, C. Moritz, and B.K. Mable, eds., Mass.: Sinauer Associates, 1996.
[18] C. Semple and M. Steel, “Phylogenetics,” Oxford Lecture Series in Math. and Its Applications, vol. 24, Oxford Univ. Press, 2003.
[19] M.A. Steel, L.A. Székely, and M.D. Hendy, “Reconstructing Trees When Sequence Sites Evolve at Variable Rates,” J. Computational Biology, vol. 1, pp. 153-163, 1994.
[20] M.A. Steel, “Recovering a Tree from the Leaf Colourations it Generates under a Markov Model,” Applied Math. Letters, vol. 7, pp. 19-24, 1994.
[21] D. Swofford, PAUP*: Phylogenetic Analysis Using Parsimony (and Other Methods), version 4.0, Florida State Univ., 2002.
[22] C. Tuffley and M. Steel, “Links between Maximum Likelihood and Maximum Parsimony under a Simple Model of Site Substitution,” Bull. of Math. Biology, vol. 59, pp. 581-607, 1997.
[23] T. Uzzell and K.W. Corbin, “Fitting Discrete Probability Distributions to Evolutionary Events,” Science, vol. 72, pp. 1089-1096, 1971.
[24] Z. Yang, “Maximum-Likelihood Estimation of Phylogeny from DNA Sequences When Substitution Rates Differ over Sites,” Molecular Biology and Evolution, vol. 10, pp. 1396-1401, 1996.

Index Terms:
Phylogenetic inference, random effects, gamma distribution, identifiability.
Citation:
Steven N. Evans, Tandy Warnow, "Unidentifiable Divergence Times in Rates-across-Sites Models," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 1, no. 3, pp. 130-134, July-Sept. 2004, doi:10.1109/TCBB.2004.34
Usage of this product signifies your acceptance of the Terms of Use.