The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.06 - November/December (2008 vol.12)
pp: 37-49
Jing Gao , University of Illinois, Urbana-Champaign
Bolin Ding , University of Illinois, Urbana-Champaign
Wei Fan , IBM T.J. Watson Research Center
Jiawei Han , University of Illinois, Urbana-Champaign
Philip S. Yu , University of Illinois, Chicago
ABSTRACT
Classification is an important data analysis tool that uses a model built from historical data to predict class labels for new observations. More and more applications are featuring data streams, rather than finite stored data sets, which are a challenge for traditional classification algorithms. Concept drifts and skewed distributions, two common properties of data stream applications, make the task of learning in streams difficult. The authors aim to develop a new approach to classify skewed data streams that uses an ensemble of models to match the distribution over under-samples of negatives and repeated samples of positives.
INDEX TERMS
data stream, classification algorithms, concept drifts, data mining, model averaging, skewed distributions
CITATION
Jing Gao, Bolin Ding, Wei Fan, Jiawei Han, Philip S. Yu, "Classifying Data Streams with Skewed Class Distributions and Concept Drifts", IEEE Internet Computing, vol.12, no. 6, pp. 37-49, November/December 2008, doi:10.1109/MIC.2008.119
REFERENCES
1. B. Babcock et al., "Models and Issues in Data Stream Systems," Proc. Symp. Principles of Database Systems (PODS 02), ACM Press, 2002, pp. 1–16.
2. C. Aggarwal, Data Streams: Models and Algorithms, Springer, 2007.
3. M.M. Gaber, A. Zaslavsky, and S. Krishnaswamy, "Mining Data Streams: A Review," ACM SIGMOD Record, vol. 34, no. 2, 2005, pp. 18–26.
4. S. Muthukrishnan, "Data Streams: Algorithms and Applications," Proc. ACM/SIAM Symp. Discrete Algorithms (SODA 03), Soc. for Industrial and Applied Mathematics, 2003, p. 413.
5. K. Tumer and J. Ghosh, "Analysis of Decision Boundaries in Linearly Combined Neural Classifiers," Pattern Recognition, vol. 29, no. 2, 1996, pp. 341–348.
6. P. Domingos, "A Unified Bias-Variance Decomposition and Its Applications," Proc. Int'l Conf. Machine Learning (ICML 00), Morgan Kaufmann, 2000, pp. 231–238.
7. H. Wang et al., "Mining Concept-Drifting Data Streams Using Ensemble Classifiers," Proc. Conf. Knowledge Discovery in Data (KDD 03), ACM Press, 2003, pp. 226–235.
8. I.H. Witten and E. Frank, Data Mining: Practical Machine Learning Tools and Techniques, 2nd ed., Morgan Kaufmann, 2005.
23 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool