Publication 2012 Issue No. 2 - March/April Abstract - A New Efficient Algorithm for the Gene-Team Problem on General Sequences
 This Article Share Bibliographic References Add to: Digg Furl Spurl Blink Simpy Google Del.icio.us Y!MyWeb Search Similar Articles Articles by Biing-Feng Wang Articles by Chung-Chin Kuo Articles by Shang-Ju Liu Articles by Chien-Hsin Lin
A New Efficient Algorithm for the Gene-Team Problem on General Sequences
March/April 2012 (vol. 9 no. 2)
pp. 330-344
 ASCII Text x Biing-Feng Wang, Chung-Chin Kuo, Shang-Ju Liu, Chien-Hsin Lin, "A New Efficient Algorithm for the Gene-Team Problem on General Sequences," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 9, no. 2, pp. 330-344, March/April, 2012.
 BibTex x @article{ 10.1109/TCBB.2011.96,author = {Biing-Feng Wang and Chung-Chin Kuo and Shang-Ju Liu and Chien-Hsin Lin},title = {A New Efficient Algorithm for the Gene-Team Problem on General Sequences},journal ={IEEE/ACM Transactions on Computational Biology and Bioinformatics},volume = {9},number = {2},issn = {1545-5963},year = {2012},pages = {330-344},doi = {http://doi.ieeecomputersociety.org/10.1109/TCBB.2011.96},publisher = {IEEE Computer Society},address = {Los Alamitos, CA, USA},}
 RefWorks Procite/RefMan/Endnote x TY - JOURJO - IEEE/ACM Transactions on Computational Biology and BioinformaticsTI - A New Efficient Algorithm for the Gene-Team Problem on General SequencesIS - 2SN - 1545-5963SP330EP344EPD - 330-344A1 - Biing-Feng Wang, A1 - Chung-Chin Kuo, A1 - Shang-Ju Liu, A1 - Chien-Hsin Lin, PY - 2012KW - AlgorithmsKW - data structuresKW - gene teamsKW - comparative genomicsKW - conserved gene clusters.VL - 9JA - IEEE/ACM Transactions on Computational Biology and BioinformaticsER -
Biing-Feng Wang, National Tsing Hua University, Hsinchu
Chung-Chin Kuo, National Tsing Hua University, Hsinchu
Shang-Ju Liu, National Tsing Hua University, Hsinchu
Chien-Hsin Lin, National Tsing Hua University, Hsinchu
Identifying conserved gene clusters is an important step toward understanding the evolution of genomes and predicting the functions of genes. A famous model to capture the essential biological features of a conserved gene cluster is called the gene-team model. The problem of finding the gene teams of two general sequences is the focus of this paper. For this problem, He and Goldwasser had an efficient algorithm that requires O(mn) time using O(m + n) working space, where m and n are, respectively, the numbers of genes in the two given sequences. In this paper, a new efficient algorithm is presented. Assume m \le n. Let C = \sum _{\alpha \in \Sigma } o_{1}(\alpha )o_{2}(\alpha ), where \Sigma is the set of distinct genes, and o_{1}(\alpha ) and o_{2}(\alpha ) are, respectively, the numbers of copies of α in the two given sequences. Our new algorithm requires O({\rm min}\{C {\rm lg} n, mn\}) time using O(m + n) working space. As compared with He and Goldwasser's algorithm, our new algorithm is more practical, as C is likely to be much smaller than mn in practice. In addition, our new algorithm is output sensitive. Its running time is O({\rm lg} n) times the size of the output. Moreover, our new algorithm can be efficiently extended to find the gene teams of k general sequences in O(k C lg (n_{1}n_{2} \ldots n_{k})) time, where n_i is the number of genes in the ith input sequence.

[1] R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules in Large Databases,” Proc. 20th Int'l Conf. Very Large Data Bases, pp. 487-499, 1994.
[2] M.P. Béal, A. Bergeron, S. Corteel, and M. Raffinot, “An Algorithmic View of Gene Teams,” Theoretical Computer Science, vol. 320, nos. 2/3, pp. 395-418, 2004.
[3] A. Bergeron, Y. Gingras, and C. Chauve, “Formal Models of Gene Clusters,” Bioinformatics Algorithms: Techniques and Applications, I. Mandoiu and A. Zelikovskym, ed., Chapter 8, pp. 177-202, Wiley, 2008.
[4] A. Bergeron and J. Stoye, “On the Similarity of Sets of Permutations and Its Applications to Genome Comparison,” J. Computational Biology, vol. 13, pp. 1340-1354, 2006.
[5] G. Blin and J. Stoye, “Finding Nested Common Intervals Efficiently,” J. Computational Biology, vol. 17, no. 9, pp. 1183-1194, 2010.
[6] T. Dandekar, B. Snel, M. Huynen, and P. Bork, “Conservation of Gene Order: A Fingerprint for Proteins that Physically Interact,” Trends in Biochemical Sciences, vol. 23, pp. 324-328, 1998.
[7] G. Didier, “Common Intervals of Two Sequences,” Proc. Third Int'l Workshop Algorithms in Bioinformatics, pp. 17-24, 2003.
[8] M.D. Ermolaeva, O. White, and S.L. Salzberg, “Prediction of Operons in Microbial Genomes,” Nucleic Acids Research, vol. 29, no. 5, pp. 1216-1221, 2001.
[9] X. He and M.H. Goldwasser, “Identifying Conserved Gene Clusters in the Presence of Homology Families,” J. Computational Biology, vol. 12, no. 6, pp. 638-656, 2005.
[10] S. Heber and J. Stoye, “Finding All Common Intervals of $k$ Permutations,” Proc. 12th Ann. Symp. Combinatorial Pattern Matching, pp. 207-218, 2001.
[11] C.-C. Kuo, GeneralGTF, http://venus.cs.nthu.edu.tw/~superiorGeneralGTF.html , 2010.
[12] S. Kim, J.-H. Choi, A. Saple, and J. Yang, “A Hybrid Gene Team Model and Its Application to Genome Analysis,” J. Bioinformatics and Computational Biology, vol. 4, no. 2, pp. 171-196, 2006.
[13] W.C. Lathe III, B. Snel, and P. Bork, “Gene Context Conservation of a Higher Order than Operons,” Trends in Biochemical Sciences, vol. 25, pp. 474-479, 2000.
[14] J. Lawrence, “Selfish Operons: The Evolutionary Impact of Gene Clustering in Prokaryotes and Eukaryotes,” Current Opinion in Genetics & Development, vol. 9, no. 6, pp. 642-648, 1999.
[15] X. Lin, X. He, and D. Xin, “Detecting Gene Clusters under Evolutionary Constraint in a Large Number of Genomes,” Bioinformatics, vol. 25, no. 5, pp. 571-577, 2009.
[16] N. Luc, J.-L. Risler, A. Bergeron, and M. Raffinot, “Gene Teams: A New Formalization of Gene Clusters for Comparative Genomics,” Computational Biology and Chemistry, vol. 27, no. 1, pp. 59-67, 2003.
[17] R. Overbeek, M. Fonstein, M. D'Souza, G.D. Pusch, and N. Maltsev, “The Use of Gene Clusters to Infer Functional Coupling,” Proc. Nat'l Academy of Sciences USA, vol. 96, no. 6, pp. 2896-2901, 1999.
[18] S. Rahmann and G.W. Klau, “Integer Linear Programs for Discovering Approximate Gene Clusters,” Proc. Sixth Workshop Algorithms in Bioinformatics, pp. 298-309, 2006.
[19] T. Schmidt and J. Stoye, “Quadratic Time Algorithms for Finding Common Intervals in Two and More Sequences,” Proc. 15th Ann. Symp. Combinatorial Pattern Matching, pp. 347-359, 2004.
[20] B. Snel, P. Bork, and M.A. Huynen, “The Identification of Functional Modules from the Genomic Association of Genes,” Proc. Nat'l Academy of Sciences USA, vol. 99, no. 9, pp. 5890-5895, 2002.
[21] T. Uno and M. Yagiura, “Fast Algorithms to Enumerate All Common Intervals of Two Permutations,” Algorithmica, vol. 26, no. 2, pp. 290-309, 2000.
[22] B.-F. Wang and C.-H. Lin, “Improved Algorithms for Finding Gene Teams and Constructing Gene Team Trees,” IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 8, no. 5, pp. 1258-1272, Sept./Oct. 2011.
[23] M. Zhang and H.W. Leong, “Gene Team Tree: A Hierarchical Representation of Gene Teams for All Gap Lengths,” J. Computational Biology, vol. 16, no. 10, pp. 1383-1398, 2009.

Index Terms:
Algorithms, data structures, gene teams, comparative genomics, conserved gene clusters.
Citation:
Biing-Feng Wang, Chung-Chin Kuo, Shang-Ju Liu, Chien-Hsin Lin, "A New Efficient Algorithm for the Gene-Team Problem on General Sequences," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 9, no. 2, pp. 330-344, March-April 2012, doi:10.1109/TCBB.2011.96