This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Third IEEE International Conference on Data Mining (ICDM'03)
Regression Clustering
Melbourne, Florida
November 19-November 22
ISBN: 0-7695-1978-4
Bin Zhang, Hewlett-Packard Research Laboratories, Palo Alto, CA
Complex distribution in real-world data is often modeled by a mixture of simpler distributions. Clustering is one of the tools to reveal the structure of this mixture. The same is true to the datasets with chosen response variables that people run regression on. Without separating the clusters with very different response properties, the residue error of the regression is large. Input variable selection could also be misguided to a higher complexity by the mixture. In Regression Clustering (RC), K (>1) regression functions are applied to the dataset simultaneously which guide the clustering of the dataset into K subsets each with a simpler distribution matching its guiding function. Each function is regressed on its own subset of data with a much smaller residue error. Both the regressions and the clustering optimize a common objective function. We present a RC algorithm based on K-Harmonic Means clustering algorithm and compare it with other existing RC algorithms based on K-Means and EM.
Citation:
Bin Zhang, "Regression Clustering," icdm, pp.451, Third IEEE International Conference on Data Mining (ICDM'03), 2003
Usage of this product signifies your acceptance of the Terms of Use.