|
| This Article | ||
| ||
| Share | ||
| Bibliographic References | ||
| Add to: | ||
| | ||
| Search | ||
| ||
Data Compression Conference (DCC '04)
Sequential Universal Lossless Techniques for Compression of Patterns and Their Description Length
Snowbird, Utah
March 23-March 25
ISBN: 0-7695-2082-0
| ASCII Text | x | ||
| Gil I. Shamir, "Sequential Universal Lossless Techniques for Compression of Patterns and Their Description Length," Data Compression Conference, pp. 419, Data Compression Conference (DCC '04), 2004. | |||
| BibTex | x | ||
| @article{ 10.1109/DCC.2004.1281487, author = {Gil I. Shamir}, title = {Sequential Universal Lossless Techniques for Compression of Patterns and Their Description Length}, journal ={Data Compression Conference}, volume = {0}, year = {2004}, issn = {1068-0314}, pages = {419}, doi = {http://doi.ieeecomputersociety.org/10.1109/DCC.2004.1281487}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, } | |||
| RefWorks Procite/RefMan/Endnote | x | ||
| TY - CONF JO - Data Compression Conference TI - Sequential Universal Lossless Techniques for Compression of Patterns and Their Description Length SN - 1068-0314 SP EP A1 - Gil I. Shamir, PY - 2004 KW - null VL - 0 JA - Data Compression Conference ER - | |||
A pattern is a sequence of indices that contains all consecutive integer indices up to some integer k in increasing order of first occurrence. If the alphabet of a source that generated a sequence is unknown, the inevitable cost of coding the unknown alphabet symbols can be exploited to create the pattern of the sequence, which, in turn, can be compressed by itself. In this paper, two low-complexity sequential schemes are proposed for universally compressing patterns that are obtained from sequences generated by independently identically distributed (i.i.d.) sources with unknown (possibly large) alphabets of unknown size. The description lengths both schemes assign to a pattern are investigated and bounded by rigorous closed form expressions in terms of the maximum likelihood (ML) probability of the underlying i.i.d. sequence. In particular, each distinct index in the pattern is shown to cost 0.5 log(n/k3)+1:59 log e bits above the i.i.d. ML cost. This results in description length for unknown parameters that is shorter than the minimum code length of an i.i.d. sequence if there are more than e19/18?n 1=3 indices in the pattern. The sequential performance results are then used to establish a connection between the pattern entropy and the underlying i.i.d. entropy. This final result points out that for large alphabets (including those larger than n), recently derived universal coding redundancy bounds for coding patterns are negligible compared to the reduction in entropy from the underlying i.i.d. one.
Citation:
Gil I. Shamir, "Sequential Universal Lossless Techniques for Compression of Patterns and Their Description Length," dcc, pp.419, Data Compression Conference (DCC '04), 2004
Usage of this product signifies your acceptance of the Terms of Use.
