Re-Pair is a dictionary-based compression method invented in 1999 by Larssonand Moffat. Although its practical performance has been established through experiments, the method has resisted all attempts of formal analysis. In thispaper we show that Re-Pair compresses a sequence T[1,n] over an alphabet ofsize $\sigma$ and k-th order entropy H_k, to at most 2nH_k+o(n\log\sigma)bits, for any k=o(log_sigma n).
Index Terms:
empirical entropy, grammar-based compression
Citation:
Gonzalo Navarro, Lu? Russo, "Re-pair Achieves High-Order Entropy," dcc, pp.537, Data Compression Conference (dcc 2008), 2008