loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Data Compression Conference (DCC'06)
Compression and Machine Learning: A New Perspective on Feature Space Vectors
Snowbird, Utah
March 28-March 30
ISBN: 0-7695-2545-8
D. Sculley, Tufts University
Carla E. Brodley, Tufts University
The use of compression algorithms in machine learning tasks such as clustering and classification has appeared in a variety of fields, sometimes with the promise of reducing problems of explicit feature selection. The theoretical justification for such methods has been founded on an upper bound on Kolmogorov complexity and an idealized information space. An alternate view shows compression algorithms implicitly map strings into implicit feature space vectors, and compressionbased similarity measures compute similarity within these feature spaces. Thus, compression-based methods are not a "parameter free" magic bullet for feature selection and data representation, but are instead concrete similarity measures within defined feature spaces, and are therefore akin to explicit feature vector models used in standard machine learning algorithms. To underscore this point, we find theoretical and empirical connections between traditional machine learning vector models and compression, encouraging cross-fertilization in future work.
Citation:
D. Sculley, Carla E. Brodley, "Compression and Machine Learning: A New Perspective on Feature Space Vectors," dcc, pp.332-332, Data Compression Conference (DCC'06), 2006
Usage of this product signifies your acceptance of the Terms of Use.