BioSeek: Exploiting Source-Capability Information for Integrated Access to Multiple Bioinformatics Data Sources
13th IEEE International Conference on BioInformatics and BioEngineering (2003)
Mar. 10, 2003 to Mar. 12, 2003
Ling Liu , Georgia Institute of Technology
David Buttler , Georgia Institute of Technology
Terence Critchlow , Lawrence Livermore National Laboratory
Wei Han , Georgia Institute of Technology
Henrique Paques , Georgia Institute of Technology
Calton Pu , Georgia Institute of Technology
Dan Rocco , Georgia Institute of Technology
Modern Bioinformatics data sources are widely used by molecular biolo gists for homology searching and new drug discovery. User-friendly and yet responsive access is one of the most desirable properties for inte grated access to the rapidly growing, heterogeneous, and distributed collection of data sources. The increasing volume and diversity of digital information related to bioinformatics (such as genomes, protein sequences, protein structures, etc.) have led to a growing problem that conventional data management systems do not have, namely finding which information sources out of many candidate choices are the most relevant and most accessible to answer a given user query. We refer to this problem as the query routing problem. In this paper we introduce the notation and issues of query routing, and present a practical solution for designing a scalable query routing system based on multi-level progressive pruning strategies. The key idea is to create and maintain source-capability profiles independently, and to provide algorithms that can dynamically discover relevant information sources for a given query through the smart use of source profiles. Compared to the keyword-based indexing techniques adopted in most of the search engines and software, our approach offers fine-granularity of interest matching, thus it is more powerful and effective for handling queries with complex conditions.
H. Paques et al., "BioSeek: Exploiting Source-Capability Information for Integrated Access to Multiple Bioinformatics Data Sources," 13th IEEE International Conference on BioInformatics and BioEngineering(BIBE), Bethesda, Maryland, 2003, pp. 263.