This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Optimization Over a Class of Tree Shape Statistics
July-September 2007 (vol. 4 no. 3)
pp. 506-512
Tree shape statistics quantify some aspect of the shape of a phylogenetic tree. They are commonly used to compare reconstructed trees to evolutionary models and to find evidence of tree reconstruction bias. Historically, to find a useful tree shape statistic, formulas have been invented by hand and then evaluated for utility. This article presents the first method which is capable of optimizing over a class of tree shape statistics, called Binary Recursive Tree Shape Statistics (BRTSS). After defining the BRTSS class, a set of algebraic expressions is defined which can be used in the recursions. The tree shape statistics definable using these expressions in the BRTSS is very general, and includes many of the statistics with which phylogenetic researchers are already familiar. We then present a practical genetic algorithm which is capable of performing optimization over BRTSS given any objective function. The chapter concludes with a successful application of the methods to find a new statistic which indicates a significant difference between two distributions on trees which were previously postulated to have similar properties.

[1] M. Sackin, “Good and Bad Phenograms,” Systematic Zoology, vol. 21, no. 2, pp. 225-226, 1972.
[2] A. Mooers and S. Heard, “Evolutionary Process from Phylogenetic Tree Shape,” Quarterly Rev. Biology, vol. 72, no. 1, pp. 31-54, 1997.
[3] M. Kirkpatrick and M. Slatkin, “Searching for Evolutionary Patterns in the Shape of a Phylogenetic Tree,” Evolution, vol. 47, no. 4, pp. 1171-1181, 1993.
[4] P. Agapow and A. Purvis, “Power of Eight Tree Shape Statistics to Detect Nonrandom Diversification: A Comparison by Simulation of Two Models of Cladogenesis,” Systematic Biology, vol. 51, no. 6, pp.866-872, 2002.
[5] D. Colless, “Phylogenetics: The Theory and Practice of Phylogenetic Systematics,” Systematic Zoology, vol. 31, no. 1, pp. 100-104, 1982.
[6] F. Matsen, “A Geometric Approach to Tree Shape Statistics,” Systematic Biology, vol. 55, no. 4, pp. 652-661, 2006.
[7] A. McKenzie and M. Steel, “Distributions of Cherries for Two Models of Trees,” Math. Biosciences, vol. 164, no. 1, pp. 81-92, 2000.
[8] K. Shao and R. Sokal, “Tree Balance,” Systematic Zoology, vol. 39, no. 3, pp.266-276, 1990.
[9] M.G.B. Blum, O. François, and S. Janson, “The Mean, Variance and Joint Distribution of Two Statistics Sensitive to Phylogenetic Tree Balance,” Annnals of Applied Probability, vol. 16, no. 4, pp. 2198-2214, Aug. 2006.
[10] D. Richardson, “Some Undecidable Problems Involving Elementary Functions of a Real Variable,” J. Symbolic Logic, vol. 33, pp.514-520, 1968.
[11] J. Moses, “Algebraic Simplification: A Guide for the Perplexed,” Comm. ACM, vol. 14, pp. 527-537, 1971.
[12] J.R. Koza, Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, 1992.
[13] W.J. Ewens, Mathematical Population Genetics. I, second ed. Springer-Verlag, 2004.
[14] E. Chailloux, P. Manoury, and B. Pagano, Développement d'Applications avec Objective CAML, O'Reilly, 2000, English translation available at http://caml.inria.fr/pub/docsoreilly-book /.
[15] M.G.B. Blum and O. François, “Which Random Processes Describe the Tree of Life? A Large-Scale Study of Phylogenetic Tree Imbalance,” Systematic Biology, vol. 55, no. 4, pp. 685-691, Aug. 2006.
[16] D. Aldous, “Stochastic Models and Descriptive Statistics for Phylogenetic Trees, from Yule to Today,” Statistics in Science, vol. 16, no. 1, pp. 23-34, 2001.
[17] M.J. Sanderson, M.J. Donoghue, W. Piel, and T. Eriksson, “Treebase: A Prototype Database of Phylogenetic Analyses and an Interactive Tool for Browsing the Phylogeny of Life,” Am. J. Botany, vol. 81, pp. 183-189, 1994.
[18] B. Bollobás, Random Graphs, second ed. Cambridge Univ. Press, 2001.
[19] M. Middendorf, E. Ziv, C. Adams, J. Hom, R. Koytcheff, C. Levovitz, G. Woods, L. Chen, and C. Wiggins, “Discriminative Topological Features Reveal Biological Network Mechanisms,” BMC Bioinformatics, vol. 5, p. 181, Nov. 2004.
[20] M. Middendorf, E. Ziv, and C.H. Wiggins, “Inferring Network Mechanisms: The Drosophila Melanogaster Protein Interaction Network,” Proc. Nat'l Academy of Sciences USA, vol. 102, no. 9, pp.3192-3197, Mar. 2005.
[21] V. Andreasen and A. Sasaki, “Shaping the Phylogenetic Tree of Influenza by Cross-Immunity,” Theoretical Population Biology, vol. 70, no. 2, pp. 164-173, 2006.
[22] N.M. Ferguson, A.P. Galvani, and R.M. Bush, “Ecological and Immunological Determinants of Influenza Evolution,” Nature, vol. 422, no. 6930, pp.428-433, Mar. 2003.
[23] D.H. Colless, “Relative Symmetry of Cladograms and Phenograms: An Experimental Study,” Systematic Biology, vol. 44, no. 1, pp. 102-108, 1995.

Index Terms:
Biology and genetics, Evolutionary computing and genetic algorithms
Citation:
Frederick Matsen, "Optimization Over a Class of Tree Shape Statistics," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 4, no. 3, pp. 506-512, July-Sept. 2007, doi:10.1109/tcbb.2007.1020
Usage of this product signifies your acceptance of the Terms of Use.