CSDL Home IEEE/ACM Transactions on Computational Biology and Bioinformatics 2012 vol.9 Issue No.02 - March/April

Subscribe

Issue No.02 - March/April (2012 vol.9)

pp: 330-344

Biing-Feng Wang , National Tsing Hua University, Hsinchu

Chung-Chin Kuo , National Tsing Hua University, Hsinchu

Shang-Ju Liu , National Tsing Hua University, Hsinchu

Chien-Hsin Lin , National Tsing Hua University, Hsinchu

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TCBB.2011.96

ABSTRACT

Identifying conserved gene clusters is an important step toward understanding the evolution of genomes and predicting the functions of genes. A famous model to capture the essential biological features of a conserved gene cluster is called the gene-team model. The problem of finding the gene teams of two general sequences is the focus of this paper. For this problem, He and Goldwasser had an efficient algorithm that requires O(mn) time using O(m + n) working space, where m and n are, respectively, the numbers of genes in the two given sequences. In this paper, a new efficient algorithm is presented. Assume m \le n. Let C = \sum _{\alpha \in \Sigma } o_{1}(\alpha )o_{2}(\alpha ), where \Sigma is the set of distinct genes, and o_{1}(\alpha ) and o_{2}(\alpha ) are, respectively, the numbers of copies of α in the two given sequences. Our new algorithm requires O({\rm min}\{C {\rm lg} n, mn\}) time using O(m + n) working space. As compared with He and Goldwasser's algorithm, our new algorithm is more practical, as C is likely to be much smaller than mn in practice. In addition, our new algorithm is output sensitive. Its running time is O({\rm lg} n) times the size of the output. Moreover, our new algorithm can be efficiently extended to find the gene teams of k general sequences in O(k C lg (n_{1}n_{2} \ldots n_{k})) time, where n_i is the number of genes in the ith input sequence.

INDEX TERMS

Algorithms, data structures, gene teams, comparative genomics, conserved gene clusters.

CITATION

Biing-Feng Wang, Chung-Chin Kuo, Shang-Ju Liu, Chien-Hsin Lin, "A New Efficient Algorithm for the Gene-Team Problem on General Sequences",

*IEEE/ACM Transactions on Computational Biology and Bioinformatics*, vol.9, no. 2, pp. 330-344, March/April 2012, doi:10.1109/TCBB.2011.96REFERENCES

- [1] R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules in Large Databases,”
Proc. 20th Int'l Conf. Very Large Data Bases, pp. 487-499, 1994.- [2] M.P. Béal, A. Bergeron, S. Corteel, and M. Raffinot, “An Algorithmic View of Gene Teams,”
Theoretical Computer Science, vol. 320, nos. 2/3, pp. 395-418, 2004.- [3] A. Bergeron, Y. Gingras, and C. Chauve, “Formal Models of Gene Clusters,”
Bioinformatics Algorithms: Techniques and Applications, I. Mandoiu and A. Zelikovskym, ed., Chapter 8, pp. 177-202, Wiley, 2008.- [4] A. Bergeron and J. Stoye, “On the Similarity of Sets of Permutations and Its Applications to Genome Comparison,”
J. Computational Biology, vol. 13, pp. 1340-1354, 2006.- [5] G. Blin and J. Stoye, “Finding Nested Common Intervals Efficiently,”
J. Computational Biology, vol. 17, no. 9, pp. 1183-1194, 2010.- [6] T. Dandekar, B. Snel, M. Huynen, and P. Bork, “Conservation of Gene Order: A Fingerprint for Proteins that Physically Interact,”
Trends in Biochemical Sciences, vol. 23, pp. 324-328, 1998.- [7] G. Didier, “Common Intervals of Two Sequences,”
Proc. Third Int'l Workshop Algorithms in Bioinformatics, pp. 17-24, 2003.- [8] M.D. Ermolaeva, O. White, and S.L. Salzberg, “Prediction of Operons in Microbial Genomes,”
Nucleic Acids Research, vol. 29, no. 5, pp. 1216-1221, 2001.- [9] X. He and M.H. Goldwasser, “Identifying Conserved Gene Clusters in the Presence of Homology Families,”
J. Computational Biology, vol. 12, no. 6, pp. 638-656, 2005.- [10] S. Heber and J. Stoye, “Finding All Common Intervals of $k$ Permutations,”
Proc. 12th Ann. Symp. Combinatorial Pattern Matching, pp. 207-218, 2001.- [11] C.-C. Kuo, GeneralGTF, http://venus.cs.nthu.edu.tw/~superiorGeneralGTF.html , 2010.
- [12] S. Kim, J.-H. Choi, A. Saple, and J. Yang, “A Hybrid Gene Team Model and Its Application to Genome Analysis,”
J. Bioinformatics and Computational Biology, vol. 4, no. 2, pp. 171-196, 2006.- [13] W.C. Lathe III, B. Snel, and P. Bork, “Gene Context Conservation of a Higher Order than Operons,”
Trends in Biochemical Sciences, vol. 25, pp. 474-479, 2000.- [14] J. Lawrence, “Selfish Operons: The Evolutionary Impact of Gene Clustering in Prokaryotes and Eukaryotes,”
Current Opinion in Genetics & Development, vol. 9, no. 6, pp. 642-648, 1999.- [15] X. Lin, X. He, and D. Xin, “Detecting Gene Clusters under Evolutionary Constraint in a Large Number of Genomes,”
Bioinformatics, vol. 25, no. 5, pp. 571-577, 2009.- [16] N. Luc, J.-L. Risler, A. Bergeron, and M. Raffinot, “Gene Teams: A New Formalization of Gene Clusters for Comparative Genomics,”
Computational Biology and Chemistry, vol. 27, no. 1, pp. 59-67, 2003.- [17] R. Overbeek, M. Fonstein, M. D'Souza, G.D. Pusch, and N. Maltsev, “The Use of Gene Clusters to Infer Functional Coupling,”
Proc. Nat'l Academy of Sciences USA, vol. 96, no. 6, pp. 2896-2901, 1999.- [18] S. Rahmann and G.W. Klau, “Integer Linear Programs for Discovering Approximate Gene Clusters,”
Proc. Sixth Workshop Algorithms in Bioinformatics, pp. 298-309, 2006.- [19] T. Schmidt and J. Stoye, “Quadratic Time Algorithms for Finding Common Intervals in Two and More Sequences,”
Proc. 15th Ann. Symp. Combinatorial Pattern Matching, pp. 347-359, 2004.- [20] B. Snel, P. Bork, and M.A. Huynen, “The Identification of Functional Modules from the Genomic Association of Genes,”
Proc. Nat'l Academy of Sciences USA, vol. 99, no. 9, pp. 5890-5895, 2002.- [21] T. Uno and M. Yagiura, “Fast Algorithms to Enumerate All Common Intervals of Two Permutations,”
Algorithmica, vol. 26, no. 2, pp. 290-309, 2000.- [22] B.-F. Wang and C.-H. Lin, “Improved Algorithms for Finding Gene Teams and Constructing Gene Team Trees,”
IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 8, no. 5, pp. 1258-1272, Sept./Oct. 2011.- [23] M. Zhang and H.W. Leong, “Gene Team Tree: A Hierarchical Representation of Gene Teams for All Gap Lengths,”
J. Computational Biology, vol. 16, no. 10, pp. 1383-1398, 2009. |