This Article 
 Bibliographic References 
 Add to: 
Adapted One-versus-All Decision Trees for Data Stream Classification
May 2009 (vol. 21 no. 5)
pp. 624-637
Sattar Hashemi, Monash University, Melbourne
Ying Yang, Monash University, Melbourne
Zahra Mirzamomen, Iran University of Science and Technology, Tehran
Mohammadreza Kangavari, Iran University of Science and Technology, Tehran
One versus all (OVA) decision trees learn k individual binary classifiers, each one to distinguish the instances of a single class from the instances of all other classes. Thus OVA is different from existing data stream classification schemes whose majority use multiclass classifiers, each one to discriminate among all the classes. This paper advocates some outstanding advantages of OVA for data stream classification. First, there is low error correlation and hence high diversity among OVA's component classifiers, which leads to high classification accuracy. Second, OVA is adept at accommodating new class labels that often appear in data streams. However, there also remain many challenges to deploy traditional OVA for classifying data streams. First, as every instance is fed to all component classifiers, OVA is known as an inefficient model. Second, OVA's classification accuracy is adversely affected by the imbalanced class distribution in data streams. This paper addresses those key challenges and consequently proposes a new OVA scheme that is adapted for data stream classification. Theoretical analysis and empirical evidence reveal that the adapted OVA can offer faster training, faster updating and higher classification accuracy than many existing popular data stream classification algorithms.

[1] C.C. Aggarwal, J. Han, J. Wang, and P.S. Yu, “A Framework for Clustering Evolving Data Streams,” Proc. 29th Int'l Conf. Very Large Data Bases (VLDB '03), pp. 81-92, 2003.
[2] R. Akbani, S. Kwek, and N. Japkowicz, “Applying Support Vector Machines to Imbalanced Datasets,” Proc. 15th European Conf. Machine Learning (ECML '04), pp. 39-50, 2004.
[3] J.A. Baranauskas and M.C. Monard, “Combining Symbolic Classifiers from Multiple Inducers,” Knowledge-Based Systems, vol. 16, pp. 129-136, 2003.
[4] L. Beygelzimer, J. Langford, and B. Zadrozny, “Weighted One-against-All,” Proc. 20th Nat'l Conf. Artificial Intelligence (AAAI '05), pp. 720-725, 2005.
[5] W. Cohen, “Fast Effective Rule Induction,” Proc. 12th Int'l Conf. Machine Learning (ICML '95), pp. 115-123, 1995.
[6] T.H. Cormen, C.E. Leiserson, R.L. Rivest, and C. Stein, Introduction to Algorithms, second ed. MIT Press and McGraw-Hill, 2001.
[7] T.G. Dietterich, “An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization,” Machine Learning, vol. 40, pp. 139-158, 2000.
[8] T.G. Dietterich and G. Bakiri, “Solving Multiclass Learning Problems via Error-Correcting Output Codes,” J. Artificial Intelligence Research, vol. 2, pp. 263-286, 1995.
[9] V. Estruch, C. Ferri, J. Hernández-Orallo, and M.J. Ramírez-Quintana, Bagging Decision Multi-Trees, pp. 41-51. Springer, 2004.
[10] J. Furnkranz, “Round Robin Classification,” J. Machine Learning Research, vol. 2, pp. 721-747, 2002.
[11] J. Gama, P. Medas, and R. Rocha, “Forest Trees for On−Line Data,” Proc. ACM Symp. Applied Computing (SAC '04), pp. 632-636, 2004.
[12] S.S. Ho, “A Martingale Framework for Concept Change Detection in Time-Varying Data Streams,” Proc. 22nd Int'l Conf. Machine Learning (ICML '05), pp. 321-327, 2005.
[13] C. Hsu and C. Lin, “A Comparison of Methods for Multi-Class Support Vector Machines,” IEEE Trans. Neural Networks, vol. 5, pp.415-425, 2002.
[14] G. Hulten, L. Spencer, and P. Domingos, “Mining Time-Changing Data Streams,” Proc. ACM SIGKDD '01, pp. 97-106, 2001.
[15] H. Kargupta and H. Dutta, “Orthogonal Decision Trees,” Proc. Fourth IEEE Int'l Conf. Data Mining (ICDM '04), pp. 427-430, Aug. 2004.
[16] R. Khoussainov, A. Heß, and N. Kushmerick, “Ensembles of Biased Classifiers,” Proc. 22nd Int'l Conf. Machine Learning (ICML'05), pp. 425-432, 2005.
[17] J. Kittler, “Combining Classifiers: A Theoretical Framework,” Pattern Analysis and Applications, vol. 1, no. 1, pp. 18-27, 1998.
[18] E.B. Kong and T.G. Dietterich, “Why Error-Correcting Output Coding Works with Decision Trees,” technical report, Dept. of Computer Science, Oregon State Univ., 1995.
[19] L.I. Kuncheva and C.J. Whitaker, “Measures of Diversity in Classifier Ensembles,” Machine Learning, vol. 51, no. 2, pp. 181-207, 2003.
[20] T.M. Mitchell, Machine Learning. McGraw-Hill, 1997.
[21] D.J. Newman, S. Hettich, C. Blake, and C. Merz, “UCI Repository of Machine Learning Databases,” , 1998.
[22] R. Perdisci, G. Gu, and W. Lee, “Using an Ensemble of One-Class SVM Classifiers to Harden Payload-Based Anomaly Detection Systems,” Proc. Sixth Int'l Conf. Data Mining (ICDM'06), pp. 488-498, 2006.
[23] J.R. Quinlan, C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.
[24] R. Rifkin and A. Klautau, “In Defense of One-vs-All Classification,” J. Machine Learning Research, vol. 5, pp. 101-141, 2004.
[25] W.N. Street and Y. Kim, “A Streaming Ensemble Algorithm (SEA) for Large-Scale Classification,” Proc. ACM SIGKDD '01, pp. 377-382, 2001.
[26] A. Tsymbal, “The Problem of Concept Drift: Definitions and Related Work,” Technical Report TCD-CS-2004-15, Computer Science Dept., Trinity College Dublin, Ireland, 2004.
[27] H. Wang, W. Fan, P.S. Yu, and J. Han, “Mining Concept Drifting Data Streams Using Ensemble Classifiers,” Proc. ACM SIGKDD '03, pp. 226-235, 2003.
[28] P. Wang, H. Wang, X. Wu, W. Wang, and B. Shi, “On Reducing Classifier Granularity in Mining Concept-Drifting Data Streams,” Proc. Fifth IEEE Int'l Conf. Data Mining (ICDM '05), pp. 474-481, 2005.
[29] G. Widmer and M. Kubat, “Learning in the Presence of Concept Drift and Hidden Contexts,” Machine Learning, vol. 23, pp. 69-101, 1996.
[30] Y. Yang, X. Wu, and X. Zhu, “Combining Proactive and Reactive Predictions for Data Streams,” Proc. ACM SIGKDD '05, pp. 710-715, 2005.
[31] Y. Yang, X. Wu, and X. Zhu, “Mining in Anticipation for Concept Change: Proactive-Reactive Prediction in Data Streams,” Data Mining and Knowledge Discovery, vol. 13, no. 3, pp. 261-289, 2006.

Index Terms:
Data mining, Machine learning
Sattar Hashemi, Ying Yang, Zahra Mirzamomen, Mohammadreza Kangavari, "Adapted One-versus-All Decision Trees for Data Stream Classification," IEEE Transactions on Knowledge and Data Engineering, vol. 21, no. 5, pp. 624-637, May 2009, doi:10.1109/TKDE.2008.181
Usage of this product signifies your acceptance of the Terms of Use.