String Processing and Information Retrieval, International Symposium on (1999)
Sept. 21, 1999 to Sept. 24, 1999
Arcot Rajasekar , University of California at San Diego
Relational databases and Datalog view each attribute as indivisible. This view, though useful in several applications, does not provide a suitable database paradigm for use in genetic, multi-media or scientific databases. Data in these applications are unstructured; querying on sub-strings of attribute-values is often necessary. Moreover, due to imprecision and incompleteness in the data, approximate reasoning is also indispensable. Our aim is to view strings as database objects that can be compared, divided, subsumed, interpreted and approximated. Allowing such operations on strings enriches the semantics and increases the expressive power of database languages.In this paper we develop an extension to the relational algebra augmenting it with the concept of a string expression with a rich structure of string variables, mapping functions, interpreted string operations and approximate evaluations. We study properties of such expressions and show that many of the well-known properties of relational algebra hold in the extension. We also discuss an extension to Datalog(String) and an implementation of a proto-type system called S-log. S-log integrates a pattern-matching routine in a Datalog framework.We contend that string-oriented database systems would be useful in applications that require efficient sub-structure analysis, such as aligning DNA strings using motifs, retrieving and synthesizing iconic images based on content.
relational algebra, datalog, pattern-matching, query processing, approximate reasoning
A. Rajasekar, "String-Oriented Databases," String Processing and Information Retrieval, International Symposium on(SPIRE), Cancun, Mexico, 1999, pp. 158.