This Article 
 Bibliographic References 
 Add to: 
Solving the Problem of Trans-Genomic Query with Alignment Tables
July-September 2008 (vol. 5 no. 3)
pp. 432-447
The trans-genomic query (TGQ) problem -- enabling the free query of biological information, even across genomes -- is a central challenge facing bioinformatics. Solutions to this problem can alter the nature of the field, moving it beyond the jungle of data integration and expanding the number and scope of questions that can be answered.An alignment table is a binary relationship on locations (sequence segments). An important special case of alignment tables are hit tables ? tables of pairs of highly similar segments produced by alignment tools like BLAST. However, alignment tables also include general binary relationships, and can represent any useful connection between sequence locations. They can be curated, and provide a high-quality queryable backbone of connections between biological information. Alignment tables thus can be a natural foundation for TGQ, as they permit a central part of the TGQ problem to be reduced to purely technical problems involving tables of locations.Key challenges in implementing alignment tables include efficient representation and indexing of sequence locations. We define a location datatype that can be incorporated naturally into common off-the-shelf database systems. We also describe an implementation of alignment tables in BLASTGRES, an extension of the open-source POSTGRESQL database system that provides indexing and operators on locations required for querying alignment tables.This paper also reviews several successful large-scale applications of alignment tables for Trans-Genomic Query. Tables with millions of alignments have been used in queries about alternative splicing, an area of genomic analysis concerning the way in which a single gene can yield multiple transcripts. Comparative genomics is a large potential application area for TGQ and alignment tables.

[1] S.F. Altschul et al., “Gapped BLAST and PSI-BLAST: A New Generation of Protein Database Search Programs,” Nucleic Acids Research, vol. 25, no. 17, pp. 3389-3402, 1997.
[2] Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins, second ed., A.D. Baxevanis and B.F. Ouellette, eds. Wiley Interscience, 2001.
[3] BioPerl Release 1.2.3 Documentation, bioperl-1.2.3/, 2003.
[4] NCBI Blast, www.ncbi.nlm.nih.govBLAST/, 2005.
[5] S. Schwartz, W.J. Kent, A. Smit, Z. Zhang, R. Baertsch, R.C. Hardison, D. Haussler, and W. Miller, “Human-Mouse Alignments with BLASTZ,” Genome Research, vol. 13, no. 1, pp. 103-107, 2003.
[6] Nat'l Cancer Institute, “Understanding the Immune System: Human Tissue Typing for Organ Transplants,” press2.nci.nih. gov/sciencebehind/immune immune29.htm, 2005.
[7] NCBI, “What Is a Genome? A Basic Introduction,” , 2005.
[8] Unigene Cluster Hs.1162: HLA-DMB: Major Histocompatibility Complex, Class II, DM Beta, , 2005.
[9] J.M. Ostell, S.J. Wheelan, and J.A. Kans, “The NCBI Data Model,” Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins, second ed., chapter 2, A.D. Baxevanis and B.F. Ouellette, eds., Wiley, 2001.
[10] J. Shaman, E. von Scheven, P. Morris, M.Y. Chang, and E. Mellins, “Analysis of HLA-DMB Mutants and -DMB Genomic Structure,” Immunogenetics, vol. 41, pp. 117-124, 1995.
[11] K. Irizarry et al., “Genome-Wide Analysis of Single-Nucleotide Polymorphisms in Human Expressed Sequences,” Nature Genetics, vol. 26, pp. 233-236, Oct. 2000.
[12] B. Modrek, A. Resch, C. Grasso, and C. Lee, “Genome-Wide Detection of Alternative Splicing Expressed Sequences of Human Genes,” Nucleic Acids Research, vol. 29, no. 13, pp. 2850-2859, Oct. 2001.
[13] B. Modrek and C. Lee, “A Genomic View of Alternative Splicing,” Nature Genetics, vol. 30, pp. 13-19, 2002.
[14] Q. Xu, B. Modrek, and C. Lee, “Genome-Wide Detection of Tissue-Specific Alternative Splicing in the Human Transcriptome,” Nucleic Acids Research, vol. 30, pp. 3754-3766, 2002.
[15] C. Lee, L. Atanelov, B. Modrek, and Y. Xing, “ASAP: The Alternative Splicing Annotation Project,” Nucleic Acids Research, vol. 31, pp. 101-105, , 2003.
[16] B. Modrek and C. Lee, “Alternative Splicing in the Human, Mouse and Rat Genomes Is Associated with an Increased Frequency of Exon Creation and/or Loss,” Nature Genetics, vol. 34, pp. 177-180, 2003.
[17] Q. Xu and C. Lee, “Discovery of Novel Splice Forms and Functional Analysis of Cancer-Specific Alternative Splicing in Human Expressed Sequences,” Nucleic Acids Research, vol. 31, pp.5635-5643, 2003.
[18] A. Resch, Y. Xing, B. Modrek, M. Gorlick, R. Riley, and C. Lee, “Assessing the Impact of Alternative Splicing on Domain Interactions in the Human Proteome,” J. Proteome Research, vol. 3, no. 1, pp. 76-83, 2003.
[19] Y. Xing, Q. Xu, and C. Lee, “Widespread Production of Novel Soluble Protein Isoforms by Alternative Splicing Removal of Transmembrane Anchoring Domains,” FEBS Letters, vol. 555, pp.572-578, 2003.
[20] A. Resch, Y. Xing, A. Alekseyenko, B. Modrek, and C. Lee, “Evidence for a Subpopulation of Conserved Alternative Splicing Events Under Selection Pressure for Protein Reading Frame Preservation,” Nucleic Acids Research, vol. 32, no. 4, pp. 1261-1269, 2004.
[21] Y. Xing, A. Resch, and C. Lee, “The Multiassembly Problem: Reconstructing Multiple Transcript Isoforms from EST Fragment Mixtures,” Genome Research, vol. 14, no. 3, pp. 426-441, 2004.
[22] C. Lee, “Generating Consensus Sequences from Partial Order Multiple Sequence Alignment Graphs,” Bioinformatics, vol. 19, pp.999-1008, 2003.
[23] D.S. Parker, M. Gorlick, and C. Lee, “Evolving from Bioinformatics in the Small to Bioinformatics in the Large,” OMICS J., vol. 7, no. 1, pp. 37-48, 2003.
[24] P. Valduriez, “Join Indexes,” ACM Trans. Database Systems, vol. 12, no. 2, pp. 218-246, June 1987.
[25] T. Maniatis and B. Tanis, “Alternative Pre-mRNA Splicing and Proteome Expansion in Metazoans,” Nature, vol. 418, pp. 236-243, 2002.
[26] C.W.J. Smith and J. Valcarcel, “Alternative Pre-mRNA Splicing: The Logic of Combinatorial Control,” Trends in Biochemical Sciences, vol. 25, pp. 381-388, 2000.
[27] P.J. Grabowski and D.L. Black, “Alternative RNA Splicing in the Nervous System,” Progress in Neurobiology, vol. 65, pp. 289-308, 2001.
[28] A.A. Mironov, J.W. Fickett, and M.S. Gelfand, “Frequent Alternative Splicing of Human Genes,” Genome Research, vol. 9, pp.1288-1293, 1999.
[29] D. Brett et al., “EST Comparison Indicates 38% of Human mRNAs Contain Possible Alternative Splice Forms,” FEBS Letters, vol. 474, pp. 83-86, 2000.
[30] R.N. Nurtdinov, I.I. Artamonova, A.A. Mironov, and M.S. Gelfand, “Low Conservation of Alternative Splicing Patterns in the Human and Mouse Genomes,” Human Molecular Genetics, vol. 12, no. 11, pp. 1313-1320, 2003.
[31] T.A. Thanaraj, F. Clark, and J. Muilu, “Conservation of Human Alternative Splice Events in Mouse,” Nucleic Acids Research, vol. 31, no. 10, pp. 2544-2552, 2003.
[32] T.A. Thanaraj et al., “ASD: The Alternative Splicing Database,” Nucleic Acids Research, vol. 32, pp. D64-D69, 2004.
[33] R. Sorek, R. Shamir, and G. Ast, “How Prevalent Is Functional Alternative Splicing in the Human Genome,” Trends in Genetics, vol. 20, no. 2, pp. 68-71, 2004.
[34] J.F. Allen, “Maintaining Knowledge about Temporal Intervals,” Comm. ACM, vol. 26, no. 11, pp. 832-843, 1983.
[35] S.M. Stephens, J.Y. Chen, M.G. Davidson, S. Thomas, and B.M. Trute, “Oracle Database 10g: A Platform for BLAST Search and Regular Expression Pattern Matching in Life Sciences,” Nucleic Acids Research, 2005.
[36] B. Eckman and D.D. Prete, “Efficient Access to BLAST Using IBM DB2 Information Integrator,” IBM Healthcare and Life Sciences, 2004.
[37] B. Giardine, L. Elnitski, C. Riemer, I. Makalowska, S. Schwartz, W. Miller, and R.C. Hardison, “GALA, a Database for Genomic Sequence Alignments and Annotations,” Genome Research, vol. 13, pp. 732-741, 2003.
[38] W.J. Kent, C.W. Sugnet, T.S. Furey, K.M. Roskin, T.H. Pringle, A.M. Zahler, and D. Haussler, “The Human Genome Browser at UCSC,” Genome Research, vol. 12, pp. 996-1006, downloads.html, 2002.
[39] R.L. Hsiao and D.S. Parker, “BLASTgres (an Extension of the Postgres Database System for BLAST and Large-Scale Bioinformatics),” Proc. Int'l Symp. Molecular Biology (ISMB '05), demonstration session, 2005.
[40] R.L. Hsiao and D.S. Parker, “The BLASTgres Database System,” Proc. Data Integration in the Life Sciences, 2005.
[41] J. Enderle, M. Hampel, and T. Seidl, “Joining Interval Data in Relational Databases,” Proc. ACM SIGMOD '04, pp. 683-694, 2004.
[42] H.-P. Kriegel, M. Poetke, and T. Seidl, “Managing Intervals Efficiently in Object-Relational Databases,” Proc. 26th Int'l Conf. Very Large Data Bases (VLDB '00), pp. 407-418, 2000.
[43] H. Edelsbrunner, “Dynamic Rectangle Intersection Searching,” Report 47, Inst. for Information Processing, Technical Univ. of Graz, 1980.
[44] F.P. Preparata and M.I. Shamos, Computational Geometry: An Introduction, fifth ed. Springer, 1993.
[45] K.R. Dittrich and A. Geppert, Component Database Systems. Morgan-Kaufmann, 2000.
[46] A. Guttman, “R-Trees: A Dynamic Index Structure for Spatial Searching,” Proc. ACM SIGMOD '84, pp. 47-57, 1984.
[47] J. Hellerstein, J.F. Naughton, and A. Pfeffer, “Generalized Search Trees for Database Systems,” Proc. Int'l Conf. Very Large Data Bases (VLDB '95), GiST.CS.Berkeley.EDU:8000/gistgist1.html , 1995.
[48] PostgreSQL Documentation (including Version 7.4 Reference Manual), www.postgresql.orgdocs/, 2003.
[49] D.E. Shasha and P. Bonnet, Database Tuning: Principles, Experiments, and Troubleshooting Techniques, revised ed. Morgan-Kaufmann, 2002.
[50] J. Srinivasan et al., “Extensible Indexing: A Framework for Integrating Domain-Specific Indexing Schemes into Oracle8i,” Proc. Int'l Conf. Data Eng. (ICDE '03), 2003.

Douglass Stott Parker, Ruey-Lung Hsiao, Yi Xing, Alissa M. Resch, Christopher J. Lee, "Solving the Problem of Trans-Genomic Query with Alignment Tables," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 5, no. 3, pp. 432-447, July-Sept. 2008, doi:10.1109/TCBB.2007.1073
Usage of this product signifies your acceptance of the Terms of Use.