Subscribe
Issue No.05 - May (2010 vol.22)
pp: 730-742
Leandro L. Minku , University of Birmingham, Birmingham
Allan P. White , University of Birmingham, Birmingham
Xin Yao , University of Birmingham, Birmingham
ABSTRACT
Online learning algorithms often have to operate in the presence of concept drift (i.e., the concepts to be learned can change with time). This paper presents a new categorization for concept drift, separating drifts according to different criteria into mutually exclusive and nonheterogeneous categories. Moreover, although ensembles of learning machines have been used to learn in the presence of concept drift, there has been no deep study of why they can be helpful for that and which of their features can contribute or not for that. As diversity is one of these features, we present a diversity analysis in the presence of different types of drifts. We show that, before the drift, ensembles with less diversity obtain lower test errors. On the other hand, it is a good strategy to maintain highly diverse ensembles to obtain lower test errors shortly after the drift independent on the type of drift, even though high diversity is more important for more severe drifts. Longer after the drift, high diversity becomes less important. Diversity by itself can help to reduce the initial increase in error caused by a drift, but does not provide the faster recovery from drifts in long-term.
INDEX TERMS
Concept drift, online learning, neural network ensembles, diversity.
CITATION
Leandro L. Minku, Allan P. White, Xin Yao, "The Impact of Diversity on Online Ensemble Learning in the Presence of Concept Drift", IEEE Transactions on Knowledge & Data Engineering, vol.22, no. 5, pp. 730-742, May 2010, doi:10.1109/TKDE.2009.156
REFERENCES
 [1] N.C. Oza and S. Russell, "Experimental Comparisons of Online and Batch Versions of Bagging and Boosting," Proc. ACM SIGKDD, pp. 359-364, 2001. [2] A. Fern and R. Givan, "Online Ensemble Learning: An Empirical Study," Machine Learning, vol. 53, pp. 71-109, 2003. [3] R. Polikar, L. Udpa, S.S. Udpa, and V. Honavar, "Learn++: An Incremental Learning Algorithm for Supervised Neural Networks," IEEE Trans. Systems, Man, and Cybernetics—Part C, vol. 31, no. 4, pp. 497-508, Nov. 2001. [4] F.L. Minku, H. Inoue, and X. Yao, "Negative Correlation in Incremental Learning," Natural Computing J., Special Issue on Nature-Inspired Learning and Adaptive Systems, vol. 8, pp. 289-320, 2009. [5] H. Abdulsalam, D.B. Skillicorn, and P. Martin, "Streaming Random Forests," Proc. Int'l Database Eng. and Applications Symp. (IDEAS), pp. 225-232, 2007. [6] H. Wang, W. Fan, P.S. Yu, and J. Han, "Mining Concept-Drifting Data Streams Using Ensemble Classifiers," Proc. ACM Conf. Knowledge Discovery and Data Mining (KDD), pp. 226-235, 2003. [7] M. Scholz and R. Klinkenberg, "An Ensemble Classifier for Drifting Concepts," Proc. Second Int'l Workshop Knowledge Discovery from Data Streams, pp. 53-64, 2005. [8] M. Scholz and R. Klinkenberg, "Boosting Classifiers for Drifting Concepts," Intelligent Data Analysis (IDA), Special Issue on Knowledge Discovery from Data Streams, vol. 11, no. 1, pp. 3-28, 2007. [9] H. He and S. Chen, "IMORL: Incremental Multiple-Object Recognition and Localization," IEEE Trans. Neural Networks, vol. 19, no. 10, pp. 1727-1738, Oct. 2008. [10] J. Gao, W. Fan, and J. Han, "On Appropriate Assumptions to Mine Data Streams: Analysis and Practice," Proc. IEEE Int'l Conf. Data Mining (ICDM), pp. 143-152, 2007. [11] W. Street and Y. Kim, "A Streaming Ensemble Algorithm (SEA) for Large-Scale Classification," Proc. ACM Conf. Knowledge Discovery and Data Mining (KDD), pp. 377-382, 2001. [12] F. Chu and C. Zaniolo, "Fast and Light Boosting for Adaptive Mining of Data Streams," Proc. Pacific-Asia Conf. Knowledge Discovery and Data Mining (PAKDD '04), pp. 282-292, 2004. [13] M. Baena-García, J. Del Campo-Ávila, R. Fidalgo, and A. Bifet, "Early Drift Detection Method," Proc. Fourth ECML PKDD Int'l Workshop Knowledge Discovery from Data Streams (IWKDDS '06), pp. 77-86, 2006. [14] J. Gama, P. Medas, G. Castillo, and P. Rodrigues, "Learning with Drift Detection," Proc. Seventh Brazilian Symp. Artificial Intelligence (SBIA '04), pp. 286-295, 2004. [15] K. Nishida and K. Yamauchi, "Detecting Concept Drift Using Statistical Testing," Proc. 10th Int'l Conf. Discovery Science (DS '07), pp. 264-269, 2007. [16] S. Ramamurthy and R. Bhatnagar, "Tracking Recurrent Concept Drift in Streaming Data Using Ensemble Classifiers," Proc. Int'l Conf. Machine Learning and Applications (ICMLA '07), pp. 404-409, 2007. [17] G. Forman, "Tackling Concept Drift by Temporal Inductive Transfer," Proc. 29th Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, pp. 252-259, 2006. [18] J.Z. Kolter and M.A. Maloof, "Dynamic Weighted Majority: A New Ensemble Method for Tracking Concept Drift," Proc. Int'l Conf. Data Mining (ICDM), pp. 123-130, 2003. [19] J.Z. Kolter and M.A. Maloof, "Using Additive Expert Ensembles to Cope with Concept Drift," Proc. Int'l Conf. Machine Learning (ICML '05), pp. 449-456, 2005. [20] W. Fan, "Streamminer: A Classifier Ensemble-Based Engine to Mine Concept-Drifting Data Streams," Proc. 30th Int'l Conf. Very Large Data Bases, pp. 1257-1260, 2004. [21] W. Fan, "Systematic Data Selection to Mine Concept-Drifting Data Streams," Proc. 10th ACM Conf. Knowledge Discovery and Data Mining (KDD), pp. 128-137, 2004. [22] J. Gao, W. Fan, J. Han, and P. Yu, "A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions," Proc. SIAM Int'l Conf. Data Mining (ICDM), 2007. [23] T.G. Dietterich, "Machine Learning Research: Four Current Directions," Artificial Intelligence, vol. 18, no. 4, pp. 97-136, 1997. [24] L.I. Kuncheva and C.J. Whitaker, "Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy," Machine Learning, vol. 51, pp. 181-207, 2003. [25] A. Narasimhamurthy and L.I. Kuncheva, "A Framework for Generating Data to Simulate Changing Environments," Proc. 25th IASTED Int'l Conf. Artificial Intelligence and Applications (AIA), pp. 384-389, 2007. [26] J.C. Schlimmer and D. Fisher, "A Case Study of Incremental Concept Induction," Proc. Fifth Nat'l Conf. Artificial Intelligence (AAAI), pp. 496-501, 1986. [27] J. Branke, "Evolutionary Algorithms for Dynamic Optimization Problems—A Survey," Technical Report 387, Inst. Applied Informatics and Formal Description Methods (AIFB), Univ. of Karlsruhe, 1999. [28] J. Branke, Evolutionary Optimization in Dynamic Environments. Kluwer Academic Publishers, 2002. [29] Y. Jin and J. Branke, "Evolutionary Optimization in Uncertain Environments—A Survey," IEEE Trans. Evolutionary Computation, vol. 9, no. 3, pp. 303-317, June 2005. [30] A. Tsymbal, M. Pechenizkiy, P. Cunningham, and S. Puuronen, "Handling Local Concept Drift with Dynamic Integration of Classifiers: Domain of Antibiotic Resistance in Nosocomial Infections," Proc. 19th IEEE Int'l Symp. Computer-Based Medical Systems (CBMS '06), pp. 56-68, 2006. [31] G. Widmer and M. Kubat, "Learning in the Presence of Concept Drift and Hidden Context," Machine Learning, vol. 23, pp. 69-101, 1996. [32] K. Nishida and K. Yamauchi, "Adaptive Classifiers-Ensemble System for Tracking Concept Drift," Proc. Sixth Int'l Conf. Machine Learning and Cybernetics (ICMLC '07), pp. 3607-3612, 2007. [33] L. Valiant, "A Theory of the Learnable," Comm. ACM, vol. 27, no. 11, pp. 1134-1142, 1984. [34] A. Kuh, T. Petsche, and R.L. Rivest, "Learning Time-Varying Concepts," Proc. Conf. Neural Information Processing Systems (NIPS), pp. 183-189, 1990. [35] V. Vapnik and A. Chervonenkis, "On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities," Theory of Probability and Its Applications, vol. 16, no. 2, pp. 264-280, 1971. [36] T.G. Dietterich, "An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization," Machine Learning, vol. 40, no. 2, pp. 139-157, 2000. [37] L. Breiman, "Random Forests," Machine Learning, vol. 45, pp. 5-32, 2001. [38] N. Ueda and R. Nakano, "Generalization Error of Ensemble Estimators," Proc. Int'l Conf. Neural Networks, p. 90-95, 1996. [39] G. Brown, J. Wyatt, R. Harris, and X. Yao, "Diversity Creation Methods: A Survey and Categorisation," J. Information Fusion, vol. 6, pp. 5-20, 2005. [40] E.K. Tang, P.N. Sunganthan, and X. Yao, "An Analysis of Diversity Measures," Machine Learning, vol. 65, pp. 247-271, 2006. [41] L. Breiman, "Bagging Predictors," Machine Learning, vol. 24, no. 2, pp. 123-140, 1996. [42] J.C. Schlimmer and R.H. Granger,Jr., "Incremental Learning from Noisy Data," Machine Learning, vol. 1, pp. 317-354, 1986. [43] M. Scholz and R. Klinkenberg, "Boosting Classifiers for Drifting Concepts," Intelligent Data Analysis (IDA), special issue on knowledge discovery from data streams, vol. 11, no. 1, pp. 3-28, 2007. [44] D. Newman, S. Hettich, C. Blake, and C. Merz, "UCI Repository of Machine Learning Databases," http://www.ics.uci.edu/~mlearnMLRepository.html , 1998. [45] D.C. Montgomery, Design and Analysis of Experiments, sixth ed. John Wiley and Sons, 2004. [46] P. Utgoff, N. Berkman, and J. Clouse, "Decision Tree Induction Based on Efficient Tree Restructuring," Machine Learning, vol. 29, no. 1, pp. 5-44, 1997. [47] J.W. Mauchly, "Significance Test for Sphericity of a Normal $n$ -Variate Distribution," Annals of Math. Statistics, vol. 11, pp. 204-209, 1940. [48] S. Greenhouse and S. Geisser, "On Methods in the Analysis of Profile Data," Psychometrika, vol. 24, pp. 95-112, 1954. [49] D. Howell, Statistical Methods for Psychology. Thomson Wadsworth, 2007. [50] C.A. Pierce, R.A. Block, and H. Aguinis, "Cautionary Note on Reporting Eta-Squared Values from Multifactor Anova Designs," Educational and Psychological Measurement, vol. 64, pp. 916-924, 2004. [51] I.H. Witten and E. Frank, Data Mining—Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann Publishers, 2000.