The Community for Technology Leaders
RSS Icon
Issue No.04 - April (2012 vol.24)
pp: 619-633
Leandro L. Minku , The University of Birmingham, Birmingham
Xin Yao , The University of Birmingham, Birmingham
Online learning algorithms often have to operate in the presence of concept drifts. A recent study revealed that different diversity levels in an ensemble of learning machines are required in order to maintain high generalization on both old and new concepts. Inspired by this study and based on a further study of diversity with different strategies to deal with drifts, we propose a new online ensemble learning approach called Diversity for Dealing with Drifts (DDD). DDD maintains ensembles with different diversity levels and is able to attain better accuracy than other approaches. Furthermore, it is very robust, outperforming other drift handling approaches in terms of accuracy when there are false positive drift detections. In all the experimental comparisons we have carried out, DDD always performed at least as well as other drift handling approaches under various conditions, with very few exceptions.
Concept drift, online learning, ensembles of learning machines, diversity.
Leandro L. Minku, Xin Yao, "DDD: A New Ensemble Approach for Dealing with Concept Drift", IEEE Transactions on Knowledge & Data Engineering, vol.24, no. 4, pp. 619-633, April 2012, doi:10.1109/TKDE.2011.58
[1] N.C. Oza and S. Russell, "Experimental Comparisons of Online and Batch Versions of Bagging and Boosting," Proc. ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 359-364, 2001.
[2] A. Fern and R. Givan, "Online Ensemble Learning: An Empirical Study," Machine Learning, vol. 53, pp. 71-109, 2003.
[3] R. Polikar, L. Udpa, S.S. Udpa, and V. Honavar, "Learn++: An Incremental Learning Algorithm for Supervised Neural Networks," IEEE Trans. Systems, Man, and Cybernetics - Part C, vol. 31, no. 4, pp. 497-508, Nov. 2001.
[4] F.L. Minku, H. Inoue, and X. Yao, "Negative Correlation in Incremental Learning," Natural Computing J., Special Issue on nature-Inspired Learning and Adaptive Systems, vol. 8, no. 2, pp. 289-320, 2009.
[5] H. Abdulsalam, D.B. Skillicorn, and P. Martin, "Streaming Random Forests," Proc. Int'l Database Eng. and Applications Symp. (IDEAS), pp. 225-232, 2007.
[6] A. Narasimhamurthy and L.I. Kuncheva, "A Framework for Generating Data to Simulate Changing Environments," Proc. 25th IASTED Int'l Multi-Conf.: Artificial Intelligence and Applications, pp. 384-389, 2007.
[7] J. Gama, P. Medas, G. Castillo, and P. Rodrigues, "Learning with Drift Detection," Proc. Seventh Brazilian Symp. Artificial Intelligence (SBIA '04), pp. 286-295, 2004.
[8] J. Gao, W. Fan, and J. Han, "On Appropriate Assumptions to Mine Data Streams: Analysis and Practice," Proc. IEEE Int'l Conf. Data Mining (ICDM), pp. 143-152, 2007.
[9] F.L. Minku, A. White, and X. Yao, "The Impact of Diversity on On-Line Ensemble Learning in the Presence of Concept Drift," IEEE Trans. Knowledge and Data Eng., vol. 22, no. 5, pp. 730-742,, May 2010.
[10] N. Littlestone and M.K. Warmuth, "The Weighted Majority Algorithm," Information and Computation, vol. 108, pp. 212-261, 1994.
[11] N. Kasabov, Evolving Connectionist Systems. Springer, 2003.
[12] M. Baena-García, J. Del Campo-Ávila, R. Fidalgo, and A. Bifet, "Early Drift Detection Method," Proc. Fourth ECML PKDD Int'l Workshop Knowledge Discovery from Data Streams (IWKDDS '06), pp. 77-86, 2006.
[13] A. Dawid and V. Vovk, "Prequential Probability: Principles and Properties," Bernoulli, vol. 5, no. 1, pp. 125-162, 1999.
[14] W. Street and Y. Kim, "A Streaming Ensemble Algorithm (SEA) for Large-Scale Classification," Proc. ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 377-382, 2001.
[15] H. Wang, W. Fan, P.S. Yu, and J. Han, "Mining Concept-Drifting Data Streams Using Ensemble Classifiers," Proc. ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 226-235, 2003.
[16] F. Chu and C. Zaniolo, "Fast and Light Boosting for Adaptive Mining of Data Streams," Proc. Pacific-Asia Conf. Knowledge Discovery and Data Mining (PAKDD '04), pp. 282-292, 2004.
[17] M. Scholz and R. Klinkenberg, "An Ensemble Classifier for Drifting Concepts," Proc. Second Int'l Workshop Knowledge Discovery from Data Streams, pp. 53-64, 2005.
[18] M. Scholz and R. Klinkenberg, "Boosting Classifiers for Drifting Concepts," Intelligent Data Analysis, Special Issue on Knowledge Discovery from Data Streams, vol. 11, no. 1, pp. 3-28, 2007.
[19] S. Ramamurthy and R. Bhatnagar, "Tracking Recurrent Concept Drift in Streaming Data Using Ensemble Classifiers," Proc. Int'l Conf. Machine Learning and Applications (ICMLA '07), pp. 404-409, 2007.
[20] J. Gao, W. Fan, J. Han, and P. Yu, "A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions," Proc. SIAM Int'l Conf. Data Mining (ICDM), 2007.
[21] H. He and S. Chen, "IMORL: Incremental Multiple-Object Recognition and Localization," IEEE Trans. Neural Networks, vol. 19, no. 10, pp. 1727-1738, Oct. 2008.
[22] K. Nishida and K. Yamauchi, "Adaptive Classifiers-Ensemble System for Tracking Concept Drift," Proc. Sixth Int'l Conf. Machine Learning and Cybernetics (ICMLC '07), pp. 3607-3612, 2007.
[23] K. Nishida and K. Yamauchi, "Detecting Concept Drift Using Statistical Testing," Proc. 10th Int'l Conf. Discovery Science (DS '07), pp. 264-269, 2007.
[24] K. Nishida, "Learning and Detecting Concept Drift," PhD dissertation, Hokkaido Univ., knishida/ papernishida2008-dissertation .pdf, 2008.
[25] K.O. Stanley, "Learning Concept Drift with a Commitee of Decision Trees," Technical Report AI-TR-03-302, Dept. of Computer Sciences, Univ. of Texas, Austin, 2003.
[26] J.Z. Kolter and M.A. Maloof, "Dynamic Weighted Majority: An Ensemble Method for Drifting Concepts," J. Machine Learning Research, vol. 8, pp. 2755-2790, 2007.
[27] J.Z. Kolter and M.A. Maloof, "Using Additive Expert Ensembles to Cope with Concept Drift," Proc. Int'l Conf. Machine Learning (ICML '05), pp. 449-456, 2005.
[28] K. Tumer and J. Ghosh, "Error Correlation and Error Reduction in Ensemble Classifiers," Connection Science, vol. 8, no. 3, pp. 385-404, 1996.
[29] L.I. Kuncheva and C.J. Whitaker, "Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy," Machine Learning, vol. 51, pp. 181-207, 2003.
[30] E.K. Tang, P.N. Sunganthan, and X. Yao, "An Analysis of Diversity Measures," Machine Learning, vol. 65, pp. 247-271, 2006.
[31] G. Yule, "On the Association of Attributes in Statistics," Philosophical Trans. Royal Soc. of London, Series A, vol. 194, pp. 257-319, 1900.
[32] J. Schlimmer and R. Granger, "Beyond Incremental Processing: Tracking Concept Drift," Proc. Fifth Nat'l Conf. Artificial Intelligence (AAAI), pp. 502-507, 1986.
[33] "The UCI KDD Archive," databases/ kddcup99kddcup99.html, 1999.
[34] M. Harries, "Splice-2 Comparative Evaluation: Electricity Pricing," Technical Report UNSW-CSE-TR-9905, Artificial Intelligence Group, School of Computer Science and Eng., The Univ. of New South Wales, Sydney, 1999.
[35] P. Utgoff, N. Berkman, and J. Clouse, "Decision Tree Induction Based on Efficient Tree Restructuring," Machine Learning, vol. 29, no. 1, pp. 5-44, 1997.
[36] F.L. Minku and X. Yao, "Using Diversity to Handle Concept Drift in On-Line Learning," Proc. Int'l Joint Conf. Neural Networks (IJCNN '09), pp. 2125-2132, 2009.
[37] I.H. Witten and E. Frank, Data Mining - Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann Publishers, 2000.
[38] N.C. Oza and S. Russell, "Online Bagging and Boosting," Proc. IEEE Int'l Conf. Systems, Man and Cybernetics, vol. 3, pp. 2340-2345, 2005.
[39] F.L. Minku and X. Yao, "On-Line Bagging Negative Correlation Learning," Proc. Int'l Joint Conf. Neural Networks (IJCNN '08), pp. 1375-1382, 2008.
[40] S. Kotsiantis, D. Kanellopoulos, and P. Pintelas, "Handling Imbalanced Datasets: A Review," GESTS Int'l Trans. Computer Science and Eng., vol. 30, no. 1, pp. 25-36, 2006.
53 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool