loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
The 43rd Annual IEEE Symposium on Foundations of Computer Science (FOCS'02)
Static Optimality Theorem for External Memory String Access
Vancouver, BC, Canada
November 16-November 19
ISBN: 0-7695-1822-2
Valentina Ciriani, University of Pisa
Paolo Ferragina, University of Pisa
Fabrizio Luccio, University of Pisa
S. Muthukrishnan, AT&T Labs - Research

Data warehouses are increasingly storing and managing large scale string data, and dealing with large volume of transactions that update and search string data. Motivated by this context, we initiate the study of self-adjusting data structures for string dictionary operations, that is, data structures that are designed to be efficient on an entire sequence rather than individual string operations. Furthermore, we study this problem in the external memory model where string data is too massive to be stored in internal memory and has to reside in disks; each access to a disk page fetches B items, and the cost of the operations is the number of pages accessed (I/Os).

We show that given a dictionary of n strings S_1 , \ldots ,S_n of total length N, a sequence of m string searches S_{i_1 } ,S_{i_2 } , \ldots ,S_{i_m } takes 0(\sum\limits_{j = 1}^m {(\frac{{\left| {S_{i_j } } \right|}}{B}} + \sum\limits_{i = 1}^n {(n_i } \log _B \frac{m}{{n_i }})) expected I/Os, where ni is the number of times Si is queried. Inserting or deleting a string S takes 0(\frac{{\left| S \right|}}{B} + \log _B n) expected amortized I/Os. The \sum\nolimits_{j = 1}^m \frac{{\left| {S_{i_j } } \right|}}{B} term is a lower bound for reading the input; the \sum\nolimits_{i = 1}^n {n_i } \log _B \frac{m}{{n_i }} term is the entropy of the query sequence and is a standard information-theoretic lower bound. This is the Static Optimality Theorem for external-memory string access. The static optimality theorem was first formalized and proved by Tarjan and Sleator for numerical dictionary in their classic "splay" trees paper in 1985 [16]; they left open the B-tree case for numerical values (page 684), and a fortiori, the case of string data in an external-memory setting, that we settle here. We obtain our result not by using traditional "splay" operations on search trees as in [16], but by designing a novel and conceptually simple self-adjusting data structure based on the well-known skip lists.

Citation:
Valentina Ciriani, Paolo Ferragina, Fabrizio Luccio, S. Muthukrishnan, "Static Optimality Theorem for External Memory String Access," focs, pp.219, The 43rd Annual IEEE Symposium on Foundations of Computer Science (FOCS'02), 2002
Usage of this product signifies your acceptance of the Terms of Use.