Issue No.05 - September/October (2011 vol.8)
pp: 1258-1272
Biing-Feng Wang , National Tsing Hua University, Hsinchu
Chien-Hsin Lin , National Tsing Hua University, Hsinchu
A gene team is a set of genes that appear in two or more species, possibly in a different order yet with the distance between adjacent genes in the team for each chromosome always no more than a certain threshold \delta. A gene team tree is a succinct way to represent all gene teams for every possible value of \delta. In this paper, improved algorithms are presented for the problem of finding the gene teams of two chromosomes and the problem of constructing a gene team tree of two chromosomes. For the problem of finding gene teams, Béal et al. had an O(n {\rm lg}^2 n)-time algorithm. Our improved algorithm requires O(n {\rm lg} t) time, where t \le n is the number of gene teams. For the problem of constructing a gene team tree, Zhang and Leong had an O(n {\rm lg}^2 n)-time algorithm. Our improved algorithm requires O(n {\rm lg} n {\rm lglg} n) time. Similar to Béal et al.'s gene team algorithm and Zhang and Leong's gene team tree algorithm, our improved algorithms can be extended to k chromosomes with the time complexities increased only by a factor of k.
Algorithms, data structures, gene teams, comparative genomics, conserved gene clusters.
Biing-Feng Wang, Chien-Hsin Lin, "Improved Algorithms for Finding Gene Teams and Constructing Gene Team Trees", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.8, no. 5, pp. 1258-1272, September/October 2011, doi:10.1109/TCBB.2010.127
[1] M.P. Béal, A. Bergeron, S. Corteel, and M. Raffinot, “An Algorithmic View of Gene Teams,” Theoretical Computer Science, vol. 320, nos. 2/3, pp. 395-418, 2004.
[2] T.H. Cormen, C.E. Leiserson, R.L. Rivest, and C. Stein, Introduction to Algorithms, second ed. McGraw-Hill, 2001.
[3] F. Coulon and M. Raffinot, “Fast Algorithms for Identifying Maximal Common Connected Sets of Interval Graphs,” Discrete Applied Math., vol. 154, pp. 1709-1721, 2006.
[4] T. Dandekar, B. Snel, M. Huynen, and P. Bork, “Conservation of Gene Order: A Fingerprint for Proteins that Physically Interact,” Trends in Biochemical Sciences, vol. 23, pp. 324-328, 1998.
[5] G. Didier, “Common Intervals of Two Sequences,” Lecture Notes in Computer Science, vol. 2812, pp. 17-24, Springer, 2003.
[6] M.D. Ermolaeva, O. White, and S.L. Salzberg, “Prediction of Operons in Microbial Genomes,” Nucleic Acids Research, vol. 29, no. 5, pp. 1216-1221, 2001.
[7] A.-T. Gai, M. Habib, C. Paul, and M. Raffinot, “Identifying Common Connected Components of Graphs,” Technical Report RR-LIRMM-03016, 2003.
[8] X. He and M.H. Goldwasser, “Identifying Conserved Gene Clusters in the Presence of Homology Families,” J. Computational Biology, vol. 12, no. 6, pp. 638-656, 2005.
[9] S. Heber and J. Stoye, “Finding All Common Intervals of $k$ Permutations,” Lecture Notes in Computer Science, vol. 2089, pp. 207-218, Springer, 2001.
[10] J. JáJá, An Introduction to Parallel Algorithms. Addison-Wesley, 1992.
[11] W.C. Lathe III, B. Snel, and P. Bork, “Gene Context Conservation of a Higher Order than Operons,” Trends in Biochemical Sciences, vol. 25, pp. 474-479, 2000.
[12] J. Lawrence, “Selfish Operons: The Evolutionary Impact of Gene Clustering in Prokaryotes and Eukaryotes,” Current Opinion in Genetics and Development, vol. 9, no. 6, pp. 642-648, 1999.
[13] C.-H. Lin GTF&GTTC, , 2010.
[14] X. Ling, X. He, D. Xin, and J. Han, “Efficiently Identifying Max-Gap Clusters in Pairwise Genome Comparison,” J. Computational Biology, vol. 15, pp. 593-609, 2008.
[15] N. Luc, J.-L. Risler, A. Bergeron, and M. Raffinot, “Gene Teams: A New Formalization of Gene Clusters for Comparative Genomics,” Computational Biology and Chemistry, vol. 27, no. 1, pp. 59-67, 2003.
[16] R. Overbeek, M. Fonstein, M. D'Souza, G.D. Pusch, and N. Maltsev, “The Use of Gene Clusters to Infer Functional Coupling,” Proc. Nat'l Academy of Sciences USA, vol. 96, no. 6, pp. 2896-2901, 1999.
[17] T. Schmidt and J. Stoye, “Quadratic Time Algorithms for Finding Common Intervals in Two and More Sequences,” Lecture Notes in Computer Science, vol. 3109, pp. 347-359, Springer, 2004.
[18] B. Snel, P. Bork, and M.A. Huynen, “The Identification of Functional Modules from the Genomic Association of Genes,” Proc. Nat'l Academy of Sciences USA, vol. 99, no. 9, pp. 5890-5895, 2002.
[19] T. Uno and M. Yagiura, “Fast Algorithms to Enumerate All Common Intervals of Two Permutations,” Algorithmica, vol. 26, no. 2, pp. 290-309, 2000.
[20] P. van Emde Boas, “Preserving Order in a Forest in Less than Logarithmic Time and Linear Space,” Information Processing Letters, vol. 6, no. 3, pp. 80-82, 1977.
[21] M. Zhang and H.W. Leong, “Gene Team Tree: A Hierarchical Representation of Gene Teams for All Gap Lengths,” J. Computational Biology, vol. 16, no. 10, pp. 1383-1398, 2009.