# Guest Editors' Introduction to the Special Section on Bioinformatics Research and Applications

Ion Mandoiu
Giri Narasimhan
Yi Pan
Yanqing Zhang

Pages: pp. 577-578

This special section includes a selection of papers presented at the Fifth International Symposium on Bioinformatics Research and Application, which was held on 13-16 May 2009 at Nova Southeastern University in Fort Lauderdale, Florida. The ISBRA symposium provides a forum for the exchange of ideas and results among researchers, developers, and practitioners working on all aspects of bioinformatics and computational biology and their applications. In 2009, 55 papers were submitted in response to the ISBRA call for papers, out of which 26 papers appeared in the proceedings published as volume 5542 of Springer Verlag's Lecture Notes in Bioinformatics series. Following a rigorous review process, extended versions of six of these papers were selected for publication in this special section. The selected papers cover a broad range of bioinformatics topics, ranging from comparative genomics and phylogenetics to population genetics, and from RNA structure prediction to analysis of protein-protein interaction networks. Below, we briefly introduce each of them.

Munoz and Sankoff explore methods for computing rearrangement distances between genomes for which contig sequences, but not necessarily full chromosomes, are available. Addressing this problem is very timely since an increasing number of genomes are released in draft format, and new algorithms are needed to extract information from such fragmentary data. The authors propose treating each contig as a chromosome, then correcting intercontig distances computed with standard rearrangement distance algorithms to account for the number of extra fusion operations needed to assemble contigs into full chromosome sequences. For the case of comparing a fragmented genome to a complete genome, the authors show a linear dependence of distance estimation measure on the number of contigs and offer formulas to correct the distance estimates accordingly. A similar linear dependence is observed when both genomes are fragmented. The authors show that corrected distances can be used to accurately reconstruct the phylogeny of a group of insects, including 10  Drosophila species.

Venkatachalam et al. present new algorithms for several optimization problems on tanglegrams. Tanglegrams, which are widely used in biology to compare phylogenies and infer phenomena such as gene transfers, are pairs of rooted trees whose edges are joined by a perfect matching. The main problem considered by the authors is finding a drawing of the trees which minimizes the number of crossings between the edges of the matching. The authors give efficient algorithms for the case when one tree is fixed, and a new fixed-parameter tractable algorithm for the case when both trees can be rearranged. The latter algorithm settles an open question on the complexity of minimizing the number of crossings for $d$ -ary trees with $d>2$ . They also consider a variant of the problem that seeks minimization of the Spearman's footrule distance instead of the crossing number, and give integer programming formulations for optimizing both objectives.

Bonizzoni et al. investigate new algorithms for the pure parsimony XOR haplotyping (PPXH) problem. For diploid organisms, the XOR genotype is a binary vector in which 0 and 1 represent homozygous, respectively heterozygous SNP loci. Since XOR genotypes can be determined using inexpensive techniques, the problem of reconstructing haplotypes from a set of XOR genotypes naturally arises. Under the pure parsimony model, the objective is to determine a smallest set of haplotypes that explain a given set of XOR genotypes. The authors introduce a graph representation of the set of solutions and establish several interesting combinatorial properties that lead to polynomial-time algorithms for some restricted versions of the PPXH problem. They also give fixed-parameter and approximation algorithms for the general version of the problem, as well as a practical heuristic.

Wu describes a new dynamic programming algorithm for computing the exact likelihood of a set of sequences under the infinite sites coalescent model. His method relies on a classical recurrence due to Ethier, Griffiths, and Tavarè. The key to the improved efficiency is to calculate the probabilities forward in time, from the root of the perfect phylogeny of the data toward its leaves. This results in an appreciable reduction in runtime and memory usage. The author presents experimental results showing the feasibility of exact likelihood computations for simulated and real data sets of moderate size, and uses it to assess the accuracy of a popular approximation method based on importance sampling.

Rajasekaran et al. present improved parsing algorithms for two types of grammars, Simple Linear Tree Adjoining Grammars (SLTAG) and Extended SLTAG (ESTLAG), previously introduced to model RNA folding with pseudoknots. The new algorithms have worst-case time complexity matching previous results of Uemura et al. but achieve improved practical performance by exploiting the sparsity of the underlying TAG parsing matrix. Experimental results on test sequences from the Rfam, Pseudobase, and tmRNA databases confirm the significantly improved practical performance.

Blin et al. introduce a new algorithm to query protein interaction networks for the presence of a subgraph similar to a given query graph. As with the previously proposed QNet method, the new algorithm uses dynamic programming and color-coding techniques to align a tree query to an arbitrary graph. To transform cyclic query graphs into trees without loss of information, the authors use a feedback vertex set and node duplications, which gives an alternative to the tree-decomposition of the query used by QNet. The authors provide a python implementation of their algorithm, called PADA1, and validate it on several PPI data sets.

We would like to thank all ISBRA authors for their high-quality contributions, and the ISBRA Program Committee and anonymous reviewers for volunteering their time and expertise to evaluate the manuscripts submitted to the symposium and the special section. Last, but not least, we would also like to thank the Editor-in-Chief, Dr. Marie-France Sagot, for providing us with the opportunity to showcase some of the exciting research presented at ISBRA in this special section of the IEEE/ACM Transactions on Computational Biology and Bioinformatics.

Ion Mandoiu

Giri Narasimhan

Yi Pan

Yanqing Zhang

Guest Editors