Subscribe

pp: 1

ABSTRACT

Rearrangements are mutations that can change the organization of a genome, but not its content. Examples are inversions of DNA segments, translocations of chromosome ends, fusions and fissions of chromosomes. All mentioned rearrangements can be represented by the generic Double Cut and Join (DCJ) operation. However, the DCJ operation also allows circular chromosomes to be created at intermediate steps, even if the compared genomes are linear. In this case it is more plausible to consider a restriction in which the reincorporation of a circular chromosome has to be done immediately after its creation. We call these two consecutive operations an ER composition. It has been shown that an ER composition mimics either an internal block interchange (when two segments in the same chromosome exchange their positions), or an internal transposition (the special case of a block interchange when the two segments are adjacent). The DCJ distance of two genomes is the same, regardless of this restriction, and can be computed in linear time. For comparing two genomes with unequal contents, in addition to rearrangements we have to allow insertions and deletions of DNA segments – named indels. It is already known that the distance in the model combining DCJ and indel operations can be exactly computed. Again, for linear genomes it would be more plausible to adopt a restricted version with ER compositions. This model was studied recently by da Silva et al. (BMC Bioinformatics 13, Suppl. 19, S14, 2012), but only an upper bound for the restricted DCJ-indel distance was provided. Here we first solve an open problem posed in that paper and present a very simple proof showing that the distance, which can be computed in linear time, is the same for both the unrestricted and the restricted DCJ-indel models. We then give a simpler algorithm for computing an optimal restricted DCJindel sorting scenario in O(n log n) time. We also relate the DCJ-indel distance to the restricted DCJ-substitution distance, which instead of indels considers a more powerful operation that allows the substitution of a DNA segment by another DNA segment. We show that the DCJ-indel distance is a 2-approximation for the restricted DCJ-substitution distance.

CITATION

Jens Stoye, "Sorting linear genomes with rearrangements and indels",

*IEEE/ACM Transactions on Computational Biology and Bioinformatics*, , no. 1, pp. 1, PrePrints PrePrints, doi:10.1109/TCBB.2014.2329297