It was conjectured that the average redundancy rate, r/sub n/, for the Lempel-Ziv code (1978) is /spl Theta/(loglog n/log n) where n is the length of the database sequence. However, it was also known that for infinitely many n the redundancy r/sub n/ is bounded from the below by 2/log n. We settle the above conjecture in the negative by proving that for a memoryless source the average redundancy rate attains asymptotically Er/sub n/=(A+/spl delta/(n))/log n+O(loglog n/log/sup 2/n) where A is an explicitly given constant that depends on the source characteristics, and /spl delta/(x) is a fluctuating function. This result is a consequence of the established second-order properties for the number of phrases in the Lempel-Ziv algorithm. We also derive the leading term for the kth moment of the number of phrases. Finally, we discuss generalized Lempel-Ziv codes for which the average redundancy rates are computed and compared with the original Lempel-Ziv code.
Index Terms:
encoding; data compression; memoryless systems; codes; sequences; redundancy; channel capacity; Lempel-Ziv code; average redundancy rate; database sequence; memoryless source; source characteristics; fluctuating function; second-order properties; Lempel-Ziv algorithm; leading term; generalized Lempel-Ziv codes
Citation:
G. Louchard, W. Szpankowski, "Average redundancy rate of the Lempel-Ziv code," dcc, pp.92, Data Compression Conference (DCC '96), 1996