The Fourth International Workshop on Algorithms in BIoinformatics (WABI) 2004 was held in Bergen, Norway, September 2004. The program committee consisted of 33 members and selected, among 117 submissions, 39 to be presented at the workshop and included in the proceedings from the workshop (volume 3240 of Lecture Notes in Bioinformatics, series edited by Sorin Istrail, Pavel Pevzner, and Michael Waterman).
The WABI 2004 program committee selected a small number of papers among the 39 to be invited to submit extended versions of their papers to a special section of the IEEE/ACM Transactions on Computational Biology and Bioinformatics. Four papers were published in the October-December 2004 issue of the journal and this issue contains an additional three papers. We would like to thank both the entire program committee for WABI and the reviewers of the papers in this issue for their valuable contributions.
The first of the papers is "A New Distance for High Level RNA Secondary Structure Comparison" authored by Julien Allali and Marie-France Sagot. This paper describes algorithms for comparing secondary structures of RNA molecules where the structures are represented by trees. The problem of classifying RNA secondary structure is becoming critical as biologists are discovering more and more noncoding functional elements in the genome (e.g., miRNA). Most likely, the major functional determinants of the elements are their secondary structure and, therefore, a metric between such secondary structures will also help delineate clusters of functional groups. In Allali and Sagot's paper, two tree representations of secondary structure are compared by analysing how one tree can be transformed into the other using an allowed set of operations. Each operation can be associated with a cost and the distance between two trees can then be defined as the minimum cost associated with a transform of one tree to the other. Allali and Sagot introduce two new operations that they name edge fusion and node fusion and show that these alleviate limitations associated with the classical tree edit operations used for RNA comparison. Importantly, they also present algorithms for calculating the distance between trees allowing the new operations in addition to the classical ones, and analyze the performance of the algorithms.
The second paper is "Topological Rearrangements and Local Search Method for Tandem Duplication Trees" and is authored by Denis Bertrand and Olivier Gascuel. The paper approaches the problem of estimating the evolutionary history of tandem repeats. A tandem repeat is a stretch of DNA sequence that contains an element that is repeated multiple times and where the repeat occurrences are next to each other in the sequence. Since the repeats are subject to mutations, they are not identical. Therefore, tandem repeats occur through evolution by "copying" (duplication) of repeat elements in blocks of varying size. Bertrand and Gascuel address the problem of finding the most likely sequence of events giving rise to the observed set of repeats. Each sequence of events can be described by a duplication tree and one searches for the tree that is the most parsimonious, i.e., one that explains how the sequence has evolved from an ancestral single copy with a minimum number of mutations along the branches of the tree. The main difference with the standard phylogeny problem is that linear ordering of the tandem duplications impose constraints the possible binary tree form. This paper describes a local search method that allows exploration of the complete space of possible duplication trees and shows that the method is superior to other existing methods for reconstructing the tree and recovering its duplication events.
The third paper is "Optimizing Multiple Seeds for Homology Search" authored by Daniel G. Brown. The paper presents an approach to selecting starting points for pairwise local alignments of protein sequences. The problem of pairwise local alignment is to find a segment from each so that the two local segments can be aligned to obtain a high score. For commonly used scoring schemes, this can be solved exactly using dynamic programming. However, pairwise alignment is frequently applied to large data sets and heuristic methods for restricting alignments to be considered are frequently used, for instance, in the BLAST programs. The key is to restrict the number of alignments as much as possible, by choosing a few good seeds, without missing high scoring alignments. The paper shows that this can be formulated as an integer programming problem and presents algorithm for choosing optimal seeds. Analysis is presented showing that the approach gives four times fewer false positives (unnecessary seeds) in comparison with BLASTP without losing more good hits.
• J. Kim is with the Department of Biology, University of Pennsylvania, 3451 Walnut Street, Philadelphia, PA 19104.
• I. Jonassen is with the Department of Informatics and Computational Biology Unit, University of Bergen, HIB N5020 Bergen, Norway.
For information on obtaining reprints of this article, please send e-mail to: firstname.lastname@example.org.
is the Edmund J. and Louise Kahn Term Endowed Professor in the Department of Biology at the University of Pennsylvania. He holds joint appointments in the Department of Computer and Information Science, Penn Center for Bioinformatics, and the Penn Genomics Institute. He serves on the editorial board of Molecular Development and Evolution
and the IEEE/ACM Transactions on Computational Biology and Bioinformatics
, the council of the Society for Systematic Biology, and the executive committee of the Cyber Infrastructure for Phylogenetics Research. His research focuses on computational and experimental approaches to comparative development. The current focus of his lab is in three areas: computational phylogenetics, in silico gene discovery, and comparative development using genome-wide gene expression data.
is a professor of computer science in the Department of Informatics at the University of Bergen in Norway, where he is member of the bioinformatics group. He is also affiliated with the Bergen Center for Computational Science at the same university where he heads the Computational Biology Unit. He is also vice president of the Society for Bioinformatics in the Nordic Countries (SocBiN) and a member of the board of the Nordic Bioinformatics Network. He coordinates the technology platform for bioinformatics funded by the Norwegian Research Council functional genomics programme FUGE. He has worked in the field of bioinformatics since the early 1990s, where he has primarily focused on methods for discovery of patterns with applications to biological sequences and structures and on methods for the analysis of microarray gene expression data.