Thisissue contains papers invited among those presented at the Workshop on Algorithms in Bioinformatics (WABI 2004) in Bergen, Norway, September 2004. The workshop was held as part of the ALGO 2004 Conference, which also hosted the European Symposium on Algorithms, the Workshop on Approximation and Online Algorithms, the International Workshop on Parameterized and Exact Computation, and the Fourth Workshop on Algorithmic Methods and Models for Optimization of Railways. For WABI, we received 117 submissions and the program committee consisting of 33 members selected 39 to be presented in Bergen and published in the proceedings (volume 3240 of the Lecture Notes in Bioinformatics, series editors S. Istrail, P. Pevzner, and M. Waterman). From the 39, the WABI program committee selected a small number to be invited to submit extended versions of their papers to a special section of the IEEE/ACM Transactions on Computational Biology and Bioinformatics. Four of the papers are included in this issue and the remaining will appear in an upcoming issue. An important goal of WABI is to promote practical algorithms rather than theoretical results; thus, all of the papers presented at the conference and the papers presented here describe methods that have been implemented as working programs and tested on empirical data. We would like to thank both the entire program committee for WABI and the reviewers for the papers presented in this issue for their valuable contributions.
The first paper in this issue, "Maximum-Scoring Segment Sets" by Miklós Csűrös, describes a linear or O(nlogn) method for finding maximal scoring segment sets. Many biological problems have been formulated by considering a scoring scheme over linear sequences. For example, a score might be given to each amino acid residue in terms of their hydrophobicity, or say, for the degree of similarity in an aligned column. Given such a scoring scheme, functionally important segments are delineated by finding segments that have the maximal score. This paper examines a generalization of the problem where disjoint segments are allowed. The computational efficiency achieved here is a significant improvement over quadratic algorithms. Two theorems are given on the hierarchical structure of maximum-scoring segment sets and two efficient algorithms to find them are presented. The paper also presents an analysis of AT-rich thermophilic genomes to find noncoding RNAs.
The second paper, "Phylogenetic Super-Networks from Partial Trees" by Daniel H. Huson, Tobias Dezulian, Tobias Klöpper, and Mike A. Steel, deals with the problem of estimating phylogenetic trees of species using trees estimated for different individual genes. As genome-scale data sets become more common, it is becoming increasingly important to address the estimation of large-scale phylogenies from the combination of partial trees. Since not all genes are present or characterized in all species, the gene trees may be considered as partial. The paper presents methods for constructing networks consistent with all partial trees and gives an example application where a phylogenetic supernetwork for 63 taxa is produced based on five partial trees.
The third paper is "An
Algorithm for Discovering Optimal Boolean Pattern Pairs" by Hideo Bannai, Heikki Hyyrö, Ayumi Shinohara, Masayuki Takeda, Kenta Nakai, and Satoru Miyano. The authors formulate a problem where the input is a set of sequences each associated with a numerical value and the aim is to find a Boolean combination of patterns so that the satisfaction of the Boolean expression by each sequence is related to the value of the numerical value associated with the sequence. This class of problems is motivated by the empirical findings that distributed conjunction of sequence motifs, e.g., presence of a pattern in conjunction with absence of a pattern, is important for biological function. A quadratic algorithm is given based on use of suffix trees and techniques used for the color set size problem by Hui in 1992.
The fourth paper is "A Polynomial-Time Algorithm for the Matching of Crossing Contact-Map Patterns" by Jens Gramm. Contact maps are used to characterize the biophysical interaction of biopolymers, including the residue interactions characterizing their 3D structures and interprotein biochemical reactions. Computational characterization of such patterns remains a difficult problem. The paper presented here solves an open problem posed by Vialette in 2004 and provides a dynamic programming-based polynomial time algorithm for the problem. The paper also demonstrates the efficiency of the method by applying it to a set of real protein structures.
• J. Kim is with the Department of Biology, University of Pennsylvania, 3451 Walnut Street, Philadelphia, PA 19104.
• I. Jonassen is with the Department of Informatics and Computational Biology Unit, University of Bergen, HIB N5020 Bergen, Norway.
For information on obtaining reprints of this article, please send e-mail to: email@example.com.
is a professor of computer science in the Department of Informatics at the University of Bergen in Norway, where he is member of the bioinformatics group. He is also affiliated with the Bergen Center for Computational Science at the same university where he heads the Computational Biology Unit. Jonassen is also vice president of the Society for Bioinformatics in the Nordic Countries (SocBiN) and a member of the board of the Nordic Bioinformatics Network. He coordinates the technology platform for bioinformatics funded by the Norwegian Research Council functional genomics programme FUGE. He has worked in the field of bioinformatics since the early 1990s, where he has primarily focused on methods for discovery of patterns with applications to biological sequences and structures and on methods for the analysis of microarray gene expression data.
is the Edmund J. and Louise Kahn Term Endowed Professor in the Department of Biology at the University of Pennsylvania. He holds joint appointments in the Department of Computer and Information Science, Penn Center for Bioinformatics, and the Penn Genomics Institute. He serves on the editorial board of Molecular Development and Evolution
and the IEEE/ACM Transactions on Computational Biology and Bioinformatics
, the council of the Society for Systematic Biology, and the executive committee of the Cyber Infrastructure for Phylogenetics Research. His research focuses on computational and experimental approaches to comparative development. The current focus of his lab is in three areas: computational phylogenetics, in silico gene discovery, and comparative development using genome-wide gene expression data.