|
| This Article | ||
| ||
| Share | ||
| Bibliographic References | ||
| Add to: | ||
| | ||
| Search | ||
| ||
Data Compression Conference (DCC '02)
Index Compression through Document Reordering
Snao Bird, Utah
April 02-April 04
ISBN: 0-7695-1477-4
| ASCII Text | x | ||
| Dan Blandford, Guy Blelloch, "Index Compression through Document Reordering," Data Compression Conference, pp. 0342, Data Compression Conference (DCC '02), 2002. | |||
| BibTex | x | ||
| @article{ 10.1109/DCC.2002.999972, author = {Dan Blandford and Guy Blelloch}, title = {Index Compression through Document Reordering}, journal ={Data Compression Conference}, volume = {0}, year = {2002}, isbn = {0-7695-1477-4}, pages = {0342}, doi = {http://doi.ieeecomputersociety.org/10.1109/DCC.2002.999972}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, } | |||
| RefWorks Procite/RefMan/Endnote | x | ||
| TY - CONF JO - Data Compression Conference TI - Index Compression through Document Reordering SN - 0-7695-1477-4 SP EP A1 - Dan Blandford, A1 - Guy Blelloch, PY - 2002 KW - indexing KW - compression KW - locality KW - clustering VL - 0 JA - Data Compression Conference ER - | |||
An important concern in the design of search engines is the construction of an inverted index. An inverted index, also called a concordance, contains a list of documents (or posting list) for every possible search term. These posting lists are usually compressed with difference coding. Difference coding yields the best compression when the lists to be coded have high locality. Coding methods have been designed to specifically take advantage of locality in inverted indices. Here, we describe an algorithm to permute the document numbers so as to create locality in an inverted index. This is done by clustering the documents. Our algorithm, when applied to the TREC ad hoc database (disks 4 and 5), improves the performance of the best difference coding algorithm we found by fourteen percent. The improvement increases as the size of the index increases, so we expect that greater improvements would be possible on larger datasets.
Index Terms:
indexing, compression, locality, clustering
Citation:
Dan Blandford, Guy Blelloch, "Index Compression through Document Reordering," dcc, pp.0342, Data Compression Conference (DCC '02), 2002
Usage of this product signifies your acceptance of the Terms of Use.
