Issue No. 03 - July-September (2008 vol. 5)
The trans-genomic query (TGQ) problem -- enabling the free query of biological information, even across genomes -- is a central challenge facing bioinformatics. Solutions to this problem can alter the nature of the field, moving it beyond the jungle of data integration and expanding the number and scope of questions that can be answered.An alignment table is a binary relationship on locations (sequence segments). An important special case of alignment tables are hit tables ? tables of pairs of highly similar segments produced by alignment tools like BLAST. However, alignment tables also include general binary relationships, and can represent any useful connection between sequence locations. They can be curated, and provide a high-quality queryable backbone of connections between biological information. Alignment tables thus can be a natural foundation for TGQ, as they permit a central part of the TGQ problem to be reduced to purely technical problems involving tables of locations.Key challenges in implementing alignment tables include efficient representation and indexing of sequence locations. We define a location datatype that can be incorporated naturally into common off-the-shelf database systems. We also describe an implementation of alignment tables in BLASTGRES, an extension of the open-source POSTGRESQL database system that provides indexing and operators on locations required for querying alignment tables.This paper also reviews several successful large-scale applications of alignment tables for Trans-Genomic Query. Tables with millions of alignments have been used in queries about alternative splicing, an area of genomic analysis concerning the way in which a single gene can yield multiple transcripts. Comparative genomics is a large potential application area for TGQ and alignment tables.
Ruey-Lung Hsiao, Douglass Stott Parker, Yi Xing, Alissa M. Resch, Christopher J. Lee, "Solving the Problem of Trans-Genomic Query with Alignment Tables", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 5, no. , pp. 432-447, July-September 2008, doi:10.1109/TCBB.2007.1073