loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Data Compression Conference (dcc 2008)
March 25-March 27
ISBN: 978-0-7695-3121-2
Many methodologies and similarity measures based on data compression have been recently introduced to compute similarities between general kinds of data. Two important similarity indices are the Normalized Compression Distance (NCD), and the Pattern Recognition based on Data Compression (PRDC). At first sight NCD and PRDC are quite different: the former is a direct metric while the latter is a methodology which computes a compression distance with an intermediate step of encoding files into texts. In spite of this, it is possible to demonstrate that they are both based on estimates of Kolmogorov complexities (when this is known for the former but not for the latter). Finally, this results in the definition of a new measure: the Model Conditioned Data Compression based Similarity Measure (McDCSM), which is a modified version of PRDC. Thus, we found a link between PRDC and NCD: this allows inferring methods used in the computation of the former into other distance measures by adding intermediate steps that may improve the accuracy of the obtained results.
Index Terms:
Similarity Measure, Kolmogorov Complexity, Normalized Compression Distance
Citation:
D. Cerra, M. Datcu, "A Model Conditioned Data Compression Based Similarity Measure," dcc, pp.509, Data Compression Conference (dcc 2008), 2008
Usage of this product signifies your acceptance of the Terms of Use.