This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Universal Data Compression Based on the Burrows-Wheeler Transformation: Theory and Practice
October 2000 (vol. 49 no. 10)
pp. 1043-1053

Abstract—A very interesting recent development in data compression is the Burrows-Wheeler Transformation [1]. The idea is to permute the input sequence in such a way that characters with a similar context are grouped together. We provide a thorough analysis of the Burrows-Wheeler Transformation from an information theoretic point of view. Based on this analysis, the main part of the paper systematically considers techniques to efficiently implement a practical data compression program based on the transformation. We show that our program achieves a better compression rate than other programs that have similar requirements in space and time.

[1] M. Burrows and D. Wheeler, “A Block-Sorting Lossless Data Compression Algorithm,” Research Report 124, Digital Systems Research Center, 1994. http://gatekeeper.dec.com/pub/DEC/SRC/research-reports/ abstractssrc-rr-124.html .
[2] F. Willems, Y. Shtarkov, and T. Tjalkens, “The Context-Tree Weighting Method: Basic Properties,” IEEE Trans. Information Theory, vol. 41, pp. 653-664, 1995.
[3] Y. Shtarkov, “Universal Sequential Coding of Single Messages,” Problems Information Transmission, vol. 23, no. 3, pp. 3-17, 1987.
[4] Y. Shtarkov, T. Tjalkens, and F. Willems, “Multialphabet Coding of Memoryless Sources,” Problems Information Transmission, vol. 31, no. 2, pp. 20-35, 1995.
[5] T.C. Bell, J.G. Cleary, and I.H. Witten, Text Compression.Englewood Cliffs, N.J.: Prentice Hall, 1990.
[6] R. Arnold and T. Bell, A Corpus for the Evaluation of Lossless Compression Algorithms Proc. Data Compression Conf., pp. 201-210, Mar. 1997.
[7] J. Gailly, “The gzip Program, Version 1.2.4,” 1993. ftp://prep.ai.mit.edu/pub/gnugzip-1.2.4.tar.gz .
[8] B. Balkenhol and S. Kurtz, “Universal Data Compression Based on the Burrows and Wheeler Transformation: Theory and Practice,” technical report, Sonderforschungsbereich: Diskrete Strukturen in der Mathematik, Universität Bielefeld, 98-069, 1998. http://www.mathematik.uni-bielefeld.de/sfb343 preprints/.
[9] J.G. Cleary, R.M. Neal, and I.H. Witten, “Arithmetic Coding for Data Compression,” Comm. ACM, vol. 30, no. 6, pp. 520-540, June 1987.
[10] R. Krichevsky and V. Trofimov, “The Performance of Universal Encoding,” IEEE Trans. Information Theory, vol. 27, pp. 199-207, 1981.
[11] J. Cleary, W. Teahan, and I. Witten, “Unbounded Length Contexts for PPM,” Proc. IEEE Data Compression Conf., pp. 52-61, 1995.
[12] P. Weiner, “Linear Pattern Matching Algorithms,” Proc. 14th IEEE Ann. Symp. Switching and Automata Theory, pp. 1-11, 1973.
[13] E.M. McCreight, "A Space Economical Suffix Tree Construction Algorithm," J. ACM, vol. 23, no. 2, pp. 262-72, 1976.
[14] E. Ukkonen, “On-Line Construction of Suffix-Trees,” Algorithmica, vol. 14, no. 3, 1995.
[15] M. Farach, “Optimal Suffix Tree Construction with Large Alphabets,” Proc. 38th Ann. Symp. Foundations of Computer Science, FOCS 97, 1997.
[16] U. Manber and E. Myers, “Suffix Arrays: A New Method for On-Line String Searches,” SIAM J. Computing, vol. 22, no. 5, pp. 935-948, 1993.
[17] K. Sadakane, “A Fast Algorithm for Making Suffix Arrays and for Burrows-Wheeler Transformation,” Proc. IEEE Data Compression Conf., pp. 129-138, 1998.
[18] S. Kurtz, “Reducing the Space Requirement of Suffix Trees,” Software—Practice and Experience, vol. 29, no. 13, pp. 1,149-1,171, 1999.
[19] R. Giegerich and S. Kurtz, “From Ukkonen to McCreight and Weiner: A Unifying View of Linear-Time Suffix Tree Construction,” Algorithmica, vol. 19, pp. 331-353, 1997.
[20] R. Giegerich and S. Kurtz, “A Comparison of Imperative and Purely Functional Suffix Tree Constructions,” Science of Computer Programming, vol. 25, nos. 2-3, pp. 187-218, 1995.
[21] R. Irving, “Suffix Binary Search Trees,” research report, Dept. of Computer Science, Univ. of Glasgow, 1996. http://www.dcs.gla.ac.uk/rwi/paperssbst.ps .
[22] M. Crochmore and R. Vérin, “Direct Construction of Compact Acyclic Word Graphs,” Proc. Ann. Symp. Combinatorial Pattern Matching (CPM '97), pp. 116-129, 1997.
[23] N. Larsson, “The Context Trees of Block Sorting Compression,” Proc. IEEE Data Compression Conf., pp. 189-198, 1998.
[24] B. Ryabko, “Data Compression by Means of a Book Stack,” Problems Information Transmission, vol. 16, no. 4, pp. 16-21, 1980.
[25] R. Ahlswede, T. Han, and K. Kobayashi, “Universal Coding of Integers and Unbounded Search Trees,” IEEE Trans. Information Theory, vol. 43, no. 2, pp. 669-682, 1997.
[26] Q. Stout, “Improved Prefix Encodings of Natural Numbers,” IEEE Trans. Information Theory, vol. 26, pp. 607-609, 1980.
[27] J. Rissanen, “A Universal Prior for Integers and Estimation by Minimum Description Length,” Annals of Statistics, vol. 11, pp. 416-431, 1983.
[28] V. Levenshtein, “On the Redundancy and Delay of Decodable Coding of Natural Numbers,” Problems in Cybernetics, vol. 20, pp. 173-179, 1968, (in Russian).
[29] P. Elias, “Universal Codword Sets and Representation of Integers,” IEEE Trans. Information Theory, vol. 21, pp. 194-203, 1975.
[30] P. Fenwick, “Block Sorting Text Compression—Final Report,” Technical Report 130, Dept. of Computer Science, Univ. of Auckland, 1996. http://www.cs.auckland.ac.nz/peter-f/ftplink TechRep130.ps.
[31] G. Cormack and R. Horspool, “Data Compression Using Dynamic Markov Modelling,” Computer J., vol. 30, pp. 541-550, 1987.
[32] A. Moffat, “Implementing the PPM Data Compression Scheme,” IEEE Trans. Comm., vol. 28, no. 11, pp. 1,917-1,921, 1990.
[33] J. Seward, “The bzip2 Program, vers. 0.1pl2,” 1997. http:/www.muraroa.demon.co.uk.
[34] M. Schindler, “The szip Homepage,” 1998. http://www.compressconsult.comszip/.
[35] J. Ziv and A. Lempel, "Compression of Individual Sequence via Variable-Rate Coding," IEEE Trans. Information Theory, vol. 24, no. 5, pp. 530-536, 1978.
[36] J. Ziv and A. Lempel, "A Universal Algorithm for Sequential Data Compression," IEEE Trans. Information Theory, vol. 23, no. 3, pp. 337-343, 1977.

Index Terms:
Lossless data compression, Burrows-Wheeler Transformation, context trees, suffix trees.
Citation:
Bernhard Balkenhol, Stefan Kurtz, "Universal Data Compression Based on the Burrows-Wheeler Transformation: Theory and Practice," IEEE Transactions on Computers, vol. 49, no. 10, pp. 1043-1053, Oct. 2000, doi:10.1109/12.888040
Usage of this product signifies your acceptance of the Terms of Use.