loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Data Compression Conference (DCC '96)
Exploiting clustering in inverted file Compression
Snowbird, UT
March 31-April 03
ISBN: 0-8186-7358-3
A. Moffat, Dept. of Comput. Sci., Melbourne Univ., Parkville, Vic., Australia
L. Stuiver, Dept. of Comput. Sci., Melbourne Univ., Parkville, Vic., Australia
Document databases contain large volumes of text, and currently have typical sizes into the gigabyte range. In order to efficiently query these text collections some form of index is required, since without an index even the fastest of pattern matching techniques results in unacceptable response times. One pervasive indexing method is the use of inverted files, also sometimes known as concordances or postings files. There has been a number of effort made to capture the "clustering" effect, and to design index compression methods that condition their probability predictions according to context. In these methods information as to whether or not the most recent (or second most recent, and so on) document contained term t is used to bias the prediction that the next document will contain term t. We further extend this notion of context-based index compression, and describe a surprisingly simple index representation that gives excellent performance on all of our test databases; allows fast decoding; and is, even in the worst case, only slightly inferior to Golomb (1966) coding.
Index Terms:
file organisation; data compression; encoding; query processing; probability; pattern recognition; decoding; very large databases; full-text databases; inverted file compression; clustering; document database query; text; pattern matching techniques; response times; indexing method; concordances; postings files; index compression methods; probability predictions; context based index compression; index representation; performance; test databases; fast decoding
Citation:
A. Moffat, L. Stuiver, "Exploiting clustering in inverted file Compression," dcc, pp.82, Data Compression Conference (DCC '96), 1996
Usage of this product signifies your acceptance of the Terms of Use.