loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Data Compression Conference (DCC '95)
Unbounded length contexts for PPM
Snowbird, Utah
March 28-March 30
ISBN: 0-8186-7012-6
J.G. Cleary, Dept. of Comput. Sci., Waikato Univ., Hamilton, New Zealand
W.J. Teahan, Dept. of Comput. Sci., Waikato Univ., Hamilton, New Zealand
I.H. Witten, Dept. of Comput. Sci., Waikato Univ., Hamilton, New Zealand
The prediction by partial matching (PPM) data compression scheme has set the performance standard in lossless compression of text throughout the past decade. The original algorithm was first published in 1984 by Cleary and Witten, and a series of improvements was described by Moffat (1990), culminating in a careful implementation, called PPMC, which has become the benchmark version. This still achieves results superior to virtually all other compression methods, despite many attempts to better it. PPM, is a finite-context statistical modeling technique that can be viewed as blending together several fixed-order context models to predict the next character in the input sequence. Prediction probabilities for each context in the model are calculated from frequency counts which are updated adaptively; and the symbol that actually occurs is encoded relative to its predicted distribution using arithmetic coding. The paper describes a new algorithm, PPM*, which exploits contexts of unbounded length. It reliably achieves compression superior to PPMC, although our current implementation uses considerably greater computational resources (both time and space). The basic PPM compression scheme is described, showing the use of contexts of unbounded length, and how it can be implemented using a tree data structure. Some results are given that demonstrate an improvement of about 6% over the old method.
Index Terms:
data compression; tree data structures; prediction theory; probability; word processing; arithmetic codes; statistical analysis; estimation theory; unbounded length contexts; prediction by partial matching; data compression; performance standard; lossless text compression; PPMC; finite-context statistical modeling; fixed-order context models; input sequence; prediction probabilities; frequency counts; predicted distribution; arithmetic coding; PPM*; tree data structure
Citation:
J.G. Cleary, W.J. Teahan, I.H. Witten, "Unbounded length contexts for PPM," dcc, pp.52, Data Compression Conference (DCC '95), 1995
Usage of this product signifies your acceptance of the Terms of Use.