loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06)
An Evaluation of Progressive Sampling for Imbalanced Data Sets
Hong Kong, China
December 18-December 22
ISBN: 0-7695-2702-7
Willie Ng, Nanyang Technological University, Singapore
Manoranjan Dash, Nanyang Technological University, Singapore
One of the emerging challenges for the data mining research community is to allow learning algorithms to mine huge databases. Sampling has often been suggested as an effective way to circumvent memory limitations as well as to improve processing speed. In this paper, we study the learning-curve sampling method, an approach for applying machine learning algorithms to massive amount of data sets. We show that a naive application of progressive sampling on data sets with highly imbalanced class distributions is often not very effective for training a learning algorithm. We then present a refinement for progressive sampling which works well in practice and is able to converge to the desired sample size very quickly and accurately. Empirical results on a number of large data sets show that our approach is able to enhance its performance.
Citation:
Willie Ng, Manoranjan Dash, "An Evaluation of Progressive Sampling for Imbalanced Data Sets," icdmw, pp.657-661, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06), 2006
Usage of this product signifies your acceptance of the Terms of Use.