The Community for Technology Leaders
Green Image
<p><b>Abstract</b>—For compression of text databases, semi-static word-based methods provide good performance in terms of both speed and disk space, but two problems arise. First, the memory requirements for the compression model during decoding can be unacceptably high. Second, the need to handle document insertions means that the collection must be periodically recompressed if compression efficiency is to be maintained on dynamic collections. Here we show that with careful management the impact of both of these drawbacks can be kept small. Experiments with a word-based model and over 500 Mb of text show that excellent compression rates can be retained even in the presence of severe memory limitations on the decoder, and after significant expansion in the amount of stored text.</p>
Document databases, text compression, dynamic databases, word-based compression, Huffman coding.

J. Zobel, N. Sharman and A. Moffat, "Text Compression for Dynamic Document Databases," in IEEE Transactions on Knowledge & Data Engineering, vol. 9, no. , pp. 302-313, 1997.
89 ms
(Ver 3.3 (11022016))