loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Fourth IEEE International Conference on Data Mining (ICDM'04)
Correlation Preserving Discretization
Brighton, United Kingdom
November 01-November 04
ISBN: 0-7695-2142-8
Sameep Mehta, The Ohio State University
Srinivasan Parthasarathy, The Ohio State University
Hui Yang, The Ohio State University
Discretization is a crucial preprocessing primitive for a variety of data warehousing and mining tasks. In this article we present a novel PCA-based unsupervised algorithm for the discretization of continuous attributes in multivariate datasets. The algorithm leverages the underlying correlation structure in the dataset to obtain the discrete intervals, and ensures that the inherent correlations are preserved. The approach also extends easily to datasets containing missing values. We demonstrate the efficacy of the approach on real datasets and as a preprocessing step for both classification and frequent itemset mining tasks. We also show that the intervals are meaningful and can uncover hidden patterns in data.
Index Terms:
Unsupervised Discretization, Missing Data
Citation:
Sameep Mehta, Srinivasan Parthasarathy, Hui Yang, "Correlation Preserving Discretization," icdm, pp.479-482, Fourth IEEE International Conference on Data Mining (ICDM'04), 2004
Usage of this product signifies your acceptance of the Terms of Use.