This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Improved Word-Aligned Binary Compression for Text Indexing
June 2006 (vol. 18 no. 6)
pp. 857-861
We present an improved compression mechanism for handling the compressed inverted indexes used in text retrieval systems, extending the word-aligned binary coding carry method. Experiments using two typical document collections show that the new method obtains superior compression to previous static codes, without penalty in terms of execution speed.

[1] I.H. Witten, A. Moffat, and T.C. Bell, Managing Gigabytes: Compressing and Indexing Documents and Images, second ed. San Francisco: Morgan Kaufmann, 1999.
[2] F. Scholer, H.E. Williams, J. Yiannis, and J. Zobel, “Compression of Inverted Indexes for Fast Query Evaluation,” Proc. 25th Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, M. Beaulieu, R. Baeza-Yates, S.H. Myaeng, and K. Järvelin, eds., pp. 222-229, Aug. 2002.
[3] A. Trotman, “Compressing Inverted Files,” Information Retrieval, vol. 6, pp. 5-19, 2003.
[4] N.R. Brisaboa, A. Fariña, G. Navarro, and M.F. Esteller, “$(S,C){\hbox{-}}{\rm{Dense}}$ Coding: An Optimized Compression Code for Natural Language Text Databases,” Proc. Symp. String Processing and Information Retrieval, M.A. Nascimento, ed., pp. 122-136, Oct. 2003.
[5] J.S. Culpepper and A. Moffat, “Enhanced Byte Codes with Restricted Prefix Properties,” Proc. Symp. String Processing and Information Retrieval, M.P. Consens and G. Navarro, eds., pp. 1-12, Nov. 2005.
[6] V.N. Anh and A. Moffat, “Inverted Index Compression Using Word-Aligned Binary Codes,” Information Retrieval, vol. 8, no. 1, pp. 151-166, Jan. 2005, www.cs.mu.oz.au/~alistaircarry/.
[7] M. Persin, J. Zobel, and R. Sacks-Davis, “Filtered Document Retrieval with Frequency-Sorted Indexes,” J. Am. Soc. for Information Science, vol. 47, no. 10, pp. 749-764, Oct. 1996.
[8] V.N. Anh, O. de Kretser, and A. Moffat, “Vector-Space Ranking with Effective Early Termination,” Proc. 24th Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, W.B. Croft, D.J. Harper, D.H. Kraft, and J. Zobel, eds., pp. 35-42, Sept. 2001.
[9] E.S. de Moura, G. Navarro, N. Ziviani, and R. Baeza-Yates, “Fast and Flexible Word Searching on Compressed Text,” ACM Trans. Information Systems, vol. 18, no. 2, pp. 113-139, 2000.

Index Terms:
Data compaction and compression, textual databases, indexing methods, file organization, compression, inverted index, binary code, text retrieval system, text searching, Web searching.
Citation:
Vo Ngoc Anh, Alistair Moffat, "Improved Word-Aligned Binary Compression for Text Indexing," IEEE Transactions on Knowledge and Data Engineering, vol. 18, no. 6, pp. 857-861, June 2006, doi:10.1109/TKDE.2006.99
Usage of this product signifies your acceptance of the Terms of Use.