This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
A Low-Granularity Classifier for Data Streams with Concept Drifts and Biased Class Distribution
September 2007 (vol. 19 no. 9)
pp. 1202-1213
Many applications track streaming data for actionable alerts, which may include, for example, network intrusions, transaction frauds, biosurveilence abnormalities, etc. Some stream classification models are built for this purpose. Due to concept drifts, maintaining a model's up-to-dateness has become one of the most challenging tasks in mining data streams. State of the art approaches, including both the incrementally updated classifiers and the ensemble classifiers, have proved that model update is a very costly process. In this paper, we show that reducing model granularity reduces update cost, as models of fine granularity enable us to efficiently pinpoint local components in the model that are affected by the concept drift. It also enables us to derive new model components to reflect the current data distribution, thus avoiding expensive updates on a global scale. Furthermore, those actionable alerts being monitored are usually rare occurring. The existing stream classifiers cannot handle this problem. We address this problem and show that the low granularity classifier handles rare events on stream data with ease. Experiments on real and synthetic data show that our approach is able to maintain good prediction accuracy at a fraction of model updating cost of state of the art approaches.

[1] Y. Ma, B. Liu, and W. Hsu, “Pruning and Summarizing the Discovered Associations,” Proc. Fifth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '99), 1999.
[2] C. Blake and C. Merz, “UCI Repository of Machine Learning Databases,” Dept. of Information and Computer Science, Univ. of California, Irvine, 1998.
[3] J.H. Chang and W.S. Lee, “Finding Recent Frequent Itemsets Adaptively over Online Data Streams,” Proc. Ninth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '03), 2003.
[4] Y. Chi, H. Wang, P.S. Yu, and R.R. Muntz, “Moment: Maintaining Closed Frequent Itemsets over a Stream Sliding Window Data Streams,” Proc. Fourth IEEE Int'l Conf. Data Mining (ICDM '04), 2004.
[5] P. Domingos and G. Hulten, “Mining High-Speed Data Streams,” Proc. Sixth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '00), pp. 71-80, 2000.
[6] W. Fan, “Systematic Data Selection to Mine Concept-Drifting Data Streams,” Proc. 10th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '04), 2004.
[7] M. Kubat and G. Widmer, “Learning in the Presence of Concept Drift and Hidden Contexts,” Machine Learning, vol. 23, 1996.
[8] J. Gehrke, V. Ganti, R. Ramakrishnan, and W. Loh, “BOAT— Optimistic Decision Tree Construction,” Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '99), 1999.
[9] J. Gehrke, R. Ramakrishnan, and V. Ganti, “RainForest: A Framework for Fast Decision Tree Construction of Large Datasets,” Proc. 24th Int'l Conf. Very Large Data Bases (VLDB '98), 1998.
[10] S. Guha, N. Milshra, R. Motwani, and L. O'Callaghan, “Clustering Data Streams,” Proc. 41st Ann. Symp. Foundations of Computer Science (FOCS '00), pp. 359-366, 2000.
[11] G. Hulten, L. Spencer, and P. Domingos, “Mining Time-Changing Data Streams,” Proc. Seventh ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '01), pp. 97-106, 2001.
[12] W. Li, J. Han, and J. Pei, “CMAR: Accurate and Efficient Classification Based on Multiple Class-Association Rules,” Proc. IEEE Int'l Conf. Data Mining (ICDM '01), 2001.
[13] B. Liu, W. Hsu, and Y. Ma, “Integrating Classification and Association Rule Mining,” Proc. Fourth ACM SIGKDD Conf. Knowledge Discovery and Data Mining (KDD '98), 1998.
[14] G. Manku and R. Motwani, “Approximate Frequency Counts over Data Streams,” Proc. 28th Int'l Conf. Very Large Data Bases (VLDB '02), 2002.
[15] W.N. Street and Y. Kim, “A Streaming Ensemble Algorithm (SEA) for Large-Scale Classification,” Proc. Sixth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '01), 2001.
[16] A. Tsymbal, “The Problem of Concept Drift: Denitions and Related Work,” Technical Report TCD-CS-2004-15, Computer Science Dept., Trinity College Dublin, Ireland, 2004.
[17] H. Wang, W. Fan, P.S. Yu, and J. Han, “Mining Concept-Drifting Data Streams Using Ensemble Classifiers,” Proc. Ninth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '03), 2003.
[18] P. Wang, H. Wang, X. Wu, W. Wang, and B. Shi, “On Reducing Classifier Granularity in Mining Concept-Drifting Data Streams,” Proc. Fifth IEEE Int'l Conf. Data Mining (ICDM '05), 2005.
[19] M. Jubat and G. Widmer, “Learning in the Presence of Concept Drift and Hidden Contexts,” Machine Learning, vol. 23, 1996.

Index Terms:
Classification, data stream, concept drift, association rule
Citation:
Peng Wang, Haixun Wang, Xiaochen Wu, Wei Wang, Baile Shi, "A Low-Granularity Classifier for Data Streams with Concept Drifts and Biased Class Distribution," IEEE Transactions on Knowledge and Data Engineering, vol. 19, no. 9, pp. 1202-1213, Sept. 2007, doi:10.1109/TKDE.2007.1057
Usage of this product signifies your acceptance of the Terms of Use.