This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
41st Annual Symposium on Foundations of Computer Science
Opportunistic data structures with applications
Redondo Beach, California
November 12-November 14
ISBN: 0-7695-0850-2
P. Ferragina, Dipt. di Inf., Pisa Univ., Italy
G. Manzini, Dipt. di Inf., Pisa Univ., Italy
We address the issue of compressing and indexing data. We devise a data structure whose space occupancy is a function of the entropy of the underlying data set. We call the data structure opportunistic since its space occupancy is decreased when the input is compressible and this space reduction is achieved at no significant slowdown in the query performance. More precisely, its space occupancy is optimal in an information-content sense because text T[1,u] is stored using O(H/sub k/(T))+o(1) bits per input symbol in the worst case, where H/sub k/(T) is the kth order empirical entropy of T (the bound holds for any fixed k). Given an arbitrary string P[1,p], the opportunistic data structure allows to search for the occurrences of P in T in O(p+occlog/sup /spl epsiv//u) time (for any fixed /spl epsiv/<0). If data are uncompressible we achieve the best space bound currently known (Grossi and Vitter, 2000); on compressible data our solution improves the succinct suffix array of (Grossi and Vitter, 2000) and the classical suffix tree and suffix array data structures either in space or in query time or both. We also study our opportunistic data structure in a dynamic setting and devise a variant achieving effective search and update time bounds. Finally, we show how to plug our opportunistic data structure into the Glimpse tool (Manber and Wu, 1994). The result is an indexing tool which achieves sublinear space and sublinear query time complexity.
Index Terms:
data structures; data compression; database indexing; computational complexity; database theory; opportunistic data structures; data compression; data indexing; entropy; data set; query performance; search; succinct suffix array; suffix tree data structures; suffix array data structures; Glimpse tool; sublinear query time complexity; sublinear space complexity
Citation:
P. Ferragina, G. Manzini, "Opportunistic data structures with applications," focs, pp.390, 41st Annual Symposium on Foundations of Computer Science, 2000
Usage of this product signifies your acceptance of the Terms of Use.