This Article 
 Bibliographic References 
 Add to: 
Assignment of Orthologous Genes via Genome Rearrangement
October-December 2005 (vol. 2 no. 4)
pp. 302-315

Abstract—The assignment of orthologous genes between a pair of genomes is a fundamental and challenging problem in comparative genomics. Existing methods that assign orthologs based on the similarity between DNA or protein sequences may make erroneous assignments when sequence similarity does not clearly delineate the evolutionary relationship among genes of the same families. In this paper, we present a new approach to ortholog assignment that takes into account both sequence similarity and evolutionary events at a genome level, where orthologous genes are assumed to correspond to each other in the most parsimonious evolving scenario under genome rearrangement. First, the problem is formulated as that of computing the signed reversal distance with duplicates between the two genomes of interest. Then, the problem is decomposed into two new optimization problems, called minimum common partition and maximum cycle decomposition, for which efficient heuristic algorithms are given. Following this approach, we have implemented a high-throughput system for assigning orthologs on a genome scale, called SOAR, and tested it on both simulated data and real genome sequence data. Compared to a recent ortholog assignment method based entirely on homology search (called INPARANOID), SOAR shows a marginally better performance in terms of sensitivity on the real data set because it is able to identify several correct orthologous pairs that are missed by INPARANOID. The simulation results demonstrate that SOAR, in general, performs better than the iterated exemplar algorithm in terms of computing the reversal distance and assigning correct orthologs.

[1] S. Altschul, T. Madden, A. Schaffer, J. Zhang, Z. Zhang, W. Miller, and D. Lipman, “Gapped BLAST and PSI-BLAST: A New Generation of Protein Database Search Programs,” Nucleic Acids Research, vol. 25, no. 17, pp. 3389-3402, 1997.
[2] D. Bader, B. Moret, and M. Yan, “A Linear-Time Algorithm for Computing Inversion Distance between Signed Permutations with an Experimental Study,” J. Computational Biology, vol. 8, no. 5, pp. 483-491, 2001.
[3] G. Bourque and P. Pevzner, “Genome-Scale Evolution: Reconstructing Gene Orders in the Ancestral Species,” Genome Research, vol. 12, pp. 26-36, 2002.
[4] S.B. Cannon and N.D. Young, “OrtholParaMap: Distinguishing Orthologs from Paralogs by Integrating Comparative Genome Data and Gene Phylogenies,” BMC Bioinformatics, vol. 4, no. 1, p. 35, 2003.
[5] A. Caprara, “Sorting by Reversals Is Difficult,” Proc. First Ann. Int'l Conf. Computational Molecular Biology, pp. 75-83, 1997.
[6] A. Caprara, “Sorting Permutations by Reversals and Eulerian Cycle Decompositions,” SIAM J. Discrete Math., vol. 12, no. 1, pp. 91-110, 1999.
[7] D. Christie and R. Irving, “Sorting Strings by Reversals and by Transpositions,” SIAM J. Discrete Math., vol. 14, no. 2, pp. 193-206, 2001.
[8] K. Chao and W. Miller, “Linear Space Algorithms that Build Local Alignments from Fragments,” Algorithmica, vol. 13, pp. 106-134, 1995.
[9] N. El-Mabrouk, “Reconstructing an Ancestral Genome Using Minimum Segments Duplications and Reversals,” J. Computer and System Sciences, vol. 65, pp. 442-464, 2002.
[10] W.M. Fitch, “Distinguishing Homologous from Analogous Proteins,” Systematic Zoology, vol. 19, pp. 99-113, 1970.
[11] Z. Fu, “Assignment of Orthologous Genes for Multichromosomal Genomes Using Genome Rearrangement,” UCR CS technical report, 2004.
[12] A. Goldstein, P. Kolman, and J. Zheng, “Minimum Common String Partition Problem: Hardness and Approximations,” Proc. 15th Int'l Symp. Algorithms and Computation (ISAAC), pp. 473-484, 2004.
[13] M.M. Halldorsson, “Approximating Discrete Collections via Local Improvements,” Proc. Sixth ACM-SIAM Symp. Discrete Algorithms, pp. 160-169, 1995.
[14] S. Hannenhalli and P. Pevzner, “Transforming Cabbage into Turnip (Polynomial Algorithm for Sorting Signed Permutations by Reversals),” Proc. 27th Ann. ACM Symp. Theory of Computing, pp. 178-187, 1995.
[15] H. Kaplan, R. Shamir, and R. Tarjan, “Faster and Simpler Algorithm for Sorting Signed Permutations by Reversals,” Proc. Eighth Ann. ACM-SIAM Symp. Discrete Algorithms, pp. 344-351, 1997.
[16] P. Pevzner and G. Tesler, “Genome Rearrangements in Mammalian Evolution: Lessons from Human and Mouse Genomes,” Genome Research, vol. 13, pp. 37-45, 2003.
[17] M. Remm, C. Storm, and E. Sonnhammer, “Automatic Clustering of Orthologs and In-Paralogs from Pairwise Species Comparisons,” J. Molecular Biology, vol. 314, pp. 1041-1052, 2001.
[18] D. Sankoff, “Genome Rearrangement with Gene Families,” Bioinformatics, vol. 15, no. 11, pp. 909-917, 1999.
[19] E. Sonnhammer and E. Koonin, “Orthology, Paralogy and Proposed Classification for Paralog Subtypes,” Trends in Genetics, vol. 16, pp. 227-231, 2000.
[20] P. Slonimski, M. Mosse, P. Golik, A. Henault, Y. Diaz, J. Risler, J. Comet, J. Aude, A. Wozniak, E. Glemet, and J. Codani, “The First Laws of Genomics,” Microbial & Comparative Genomics, vol. 3, p. 46, 1998.
[21] K. Swenson, M. Marron, J. Earnest-DeYoung, and B. Moret, “Approximating the True Evolutionary Distance between Two Genomes,” Technical Report TR-CS-2004-15, Univ. of New Mexico, 2004.
[22] C. Storm and E. Sonnhammer, “Automated Ortholog Inference from Phylogenetic Trees and Calculation of Orthology Reliability,” Bioinformatics, vol. 18, no. 1, 2002.
[23] R.L. Tatusov, M.Y. Galperin, D.A. Natale, and E.V. Koonin, “The COG Database: A Tool for Genome-Scale Analysis of Protein Functions and Evolution,” Nucleic Acids Research, vol. 28, pp. 33-36, 2000.
[24] R.L. Tatusov, E.V. Koonin, and D.J. Lipman, “A Genomic Perspective on Protein Families,” Science, vol. 278, pp. 631-637, 1997.
[25] J. Tang and B. Moret, “Phylogenetic Reconstruction from Gene Rearrangement Data with Unequal Gene Contents,” Proc. Eighth Workshop Algorithms and Data Structures, (WADS '03) pp. 37-46, 2003.
[26] H.M. Wain, E.A. Bruford, R.C. Lovering, M.J. Lush, M.W. Wright, and S. Povey, “Guidelines for Human Gene Nomenclature,” Genomics, vol. 79, no. 4, pp. 464-470, 2002.
[27] L. Wang and T. Warnow, “Estimating True Evolutionary Distances between Genomes,” Proc. 33rd Ann. ACM Symp. Theory of Computing, pp. 637-646, 2001.
[28] Y.P. Yuan, O. Eulenstein, M. Vingron, and P. Bork, “Towards Detection of Orthologues in Sequence Databases,” Bioinformatics, vol. 14, no. 3, pp. 285-289, 1998.

Index Terms:
Ortholog, paralog, gene duplication, genome rearrangement, reversal, comparative genomics.
Xin Chen, Jie Zheng, Zheng Fu, Peng Nan, Yang Zhong, Stefano Lonardi, Tao Jiang, "Assignment of Orthologous Genes via Genome Rearrangement," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 2, no. 4, pp. 302-315, Oct.-Dec. 2005, doi:10.1109/TCBB.2005.48
Usage of this product signifies your acceptance of the Terms of Use.