The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.10 - Oct. (2013 vol.25)
pp: 2343-2355
Hock Hee Ang , Nanyang Technological University, Singapore
Vivekanand Gopalkrishnan , Deloitte Analytics Institute Asia, Singapore
Indre Zliobaite , Bournemouth University, Poole
Mykola Pechenizkiy , Eindhoven University of Technology, The Netherlands
Steven C.H. Hoi , Nanyang Technological University, Singapore
ABSTRACT
In a distributed computing environment, peers collaboratively learn to classify concepts of interest from each other. When external changes happen and their concepts drift, the peers should adapt to avoid increase in misclassification errors. The problem of adaptation becomes more difficult when the changes are asynchronous, i.e., when peers experience drifts at different times. We address this problem by developing an ensemble approach, PINE, that combines reactive adaptation via drift detection, and proactive handling of upcoming changes via early warning and adaptation across the peers. With empirical study on simulated and real-world data sets, we show that PINE handles asynchronous concept drifts better and faster than current state-of-the-art approaches, which have been designed to work in less challenging environments. In addition, PINE is parameter insensitive and incurs less communication cost while achieving better accuracy.
INDEX TERMS
Data models, Predictive models, Accuracy, Distributed databases, Adaptation models, Detectors, Training, distributed systems, Data models, Predictive models, Accuracy, Distributed databases, Adaptation models, Detectors, Training, concept drift, Classification
CITATION
Hock Hee Ang, Vivekanand Gopalkrishnan, Indre Zliobaite, Mykola Pechenizkiy, Steven C.H. Hoi, "Predictive Handling of Asynchronous Concept Drifts in Distributed Environments", IEEE Transactions on Knowledge & Data Engineering, vol.25, no. 10, pp. 2343-2355, Oct. 2013, doi:10.1109/TKDE.2012.172
REFERENCES
[1] P. Luo, H. Xiong, K. Lü, and Z. Shi, "Distributed Classification in Peer-to-Peer Networks," Proc. 13th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD), pp. 968-976, 2007.
[2] K. Bhaduri, R. Wolff, C. Giannella, and H. Kargupta, "Distributed Decision-Tree Induction in Peer-to-Peer Systems," Statistical Analysis and Data Mining, vol. 1, no. 2, pp. 85-103, 2008.
[3] H.H. Ang, V. Gopalkrishnan, W.K. Ng, and S.C.H. Hoi, "Communication-Efficient Classification in P2P Networks," Proc. European Conf. Machine Learning and Knowledge Discovery in Databases (ECML/PKDD), pp. 83-98, 2009.
[4] H.H. Ang, V. Gopalkrishnan, W.K. Ng, and S.C.H. Hoi, "On Classifying Drifting Concepts in P2P Networks," Proc. European Conf. Machine Learning and Knowledge Discovery in Databases (ECML/PKDD), pp. 24-39, 2010.
[5] S. Datta, K. Bhaduri, C. Giannella, R. Wolff, and H. Kargupta, "Distributed Data Mining in Peer-to-Peer Networks," Internet Computing, vol. 10, no. 4, pp. 18-26, 2006.
[6] A. Bifet, G. Holmes, B. Pfahringer, R. Kirkby, and R. Gavalda, "New Ensemble Methods for Evolving Data Streams," Proc. 15th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD), pp. 139-148, 2009.
[7] S. Bach and M. Maloof, "A Bayesian Approach to Concept Drift," Proc. Conf. Neural Information Processing Systems (NIPS), pp. 127-135, 2010.
[8] E. Ikonomovska, J. Gama, and S. Dzeroski, "Learning Model Trees from Evolving Data Streams," Data Mining and Knowledge Discovery, vol. 23, pp. 128-168, 2011.
[9] M. Masud, Q. Chen, L. Khan, C. Aggarwal, J. Gao, J. Han, and B. Thuraisingham, "Addressing Concept-Evolution in Concept-Drifting Data Streams," Proc. IEEE Int'l Conf. Data Mining (ICDM), 2010.
[10] L.L. Minku, A.P. White, and X. Yao, "The Impact of Diversity on Online Ensemble Learning in the Presence of Concept Drift," IEEE Trans. Knowledge and Data Eng., vol. 22, pp. 730-742, May 2010.
[11] G. Widmer and M. Kubat, "Learning in the Presence of Concept Drift and Hidden Contexts," Machine Learning, vol. 23, no. 1, pp. 69-101, 1996.
[12] R. Klinkenberg, "Learning Drifting Concepts: Example Selection vs. Example Weighting," Intelligent Data Analysis, vol. 8, pp. 281-300, 2004.
[13] J.Z. Kolter and M.A. Maloof, "Dynamic Weighted Majority: An Ensemble Method for Drifting Concepts," J. Machine Learning Research, vol. 8, pp. 2755-2790, 2007.
[14] G. Hulten, L. Spencer, and P. Domingos, "Mining Time-Changing Data Streams," Proc. Seventh ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD), pp. 97-106, 2001.
[15] H. Wang, W. Fan, P.S. Yu, and J. Han, "Mining Concept-Drifting Data Streams Using Ensemble Classifiers," Proc. Ninth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD), pp. 226-235, 2003.
[16] I. Zliobaite, "Learning under Concept Drift: An Overview," technical report, Vilnius Univ., 2009.
[17] W.N. Street and Y. Kim, "A Streaming Ensemble Algorithm (SEA) for Large-Scale Classification," Proc. Seventh ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD), pp. 377-382, 2001.
[18] S.H. Bach and M.A. Maloof, "Paired Learners for Concept Drift," Proc. IEEE Int'l Conf. Data Mining (ICDM), pp. 23-32, 2008.
[19] A. Bifet and R. Gavalda, "Adaptive Learning from Evolving Data Streams," Proc. Eighth Int'l Symp. Intelligent Data Analysis (IDA), pp. 249-260, 2009.
[20] H. Mannila, H. Toivonen, and A.I. Verkamo, "Discovery of Frequent Episodes in Event Sequences," Data Mining and Knowledge Discovery, vol. 1, no. 3, pp. 259-289, 1997.
[21] S.K. Harms and J.S. Deogun, "Sequential Association Rule Mining with Time Lags," J. Intelligent Information Systems, vol. 22, no. 1, pp. 7-22, 2004.
[22] T.-Y. Lee, E.T. Wang, and A.L.P. Chen, "Mining Serial Episode Rules with Time Lags over Multiple Data Streams," Proc. 10th Int'l Conf. Data Warehousing and Knowledge Discovery (DaWaK), pp. 227-240, 2008.
[23] M.J. Zaki, "Spade: An Efficient Algorithm for Mining Frequent Sequences," Machine Learning, vol. 42, nos. 1/2, pp. 31-60, 2001.
[24] S. Laxman, P.S. Sastry, and K.P. Unnikrishnan, "A Fast Algorithm for Finding Frequent Episodes in Event Streams," Proc. 13th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD), pp. 410-419, 2007.
[25] J. Gama, R. Sebastiao, and P.P. Rodrigues, "Issues in Evaluation of Stream Learning Algorithms," Proc. 13th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD), pp. 329-338, 2009.
[26] C.-J. Hsieh, K.-W. Chang, C.-J. Lin, S.S. Keerthi, and S. Sundararajan, "A Dual Coordinate Descent Method for Large-Scale Linear SVM," Proc. 25th Int'l Conf. Machine learning (ICML), pp. 408-415, 2008.
[27] R. Chen, K. Sivakumar, and H. Kargupta, "Distributed Web Mining Using Bayesian Networks from Multiple Data Streams," Proc. IEEE Int'l Conf. Data Mining (ICDM), pp. 75-82, 2001.
[28] X. Song, M. Wu, C. Jermaine, and S. Ranka, "Statistical Change Detection for Multidimensional Data," Proc. 13th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD), pp. 667-676, 2007.
[29] D. Kifer, S. Ben-David, and J. Gehrke, "Detecting Change in Data Streams," Proc. 30th Int'l Conf. Very Large Data Bases (VLDB), pp. 180-191, 2004.
[30] J. Gama, P. Medas, G. Castillo, and P.P. Rodrigues, "Learning with Drift Detection," Proc. Brazilian Symp. Artificial Intelligence (SBIA), pp. 286-295, 2004.
[31] M. Basseville and I.V. Nikiforov, Detection of Abrupt Changes - Theory and Application. Prentice-Hall, Inc., 1993.
[32] M. Leeuwen and A. Siebes, "Streamkrimp: Detecting Change in Data Streams," Proc. European Conf. Machine Learning and Knowledge Discovery in Databases (ECML/PKDD), pp. 672-687, 2008.
[33] K. Nishida and K. Yamauchi, "Detecting Concept Drift Using Statistical Testing," Proc. 10th Int'l Conf. Discovery Science, pp. 264-269, 2007.
[34] A. Bifet and R. Gavalda, "Learning from Time-Changing Data with Adaptive Windowing," Proc. SIAM Int'l Conf. Data Mining (SDM), 2007.
[35] R. Klinkenberg and I. Renz, "Adaptive Information Filtering: Learning in the Presence of Concept Drifts," Proc. Conf. Learning for Text Categorization, pp. 33-40, 1998.
[36] Y. Yang, X. Wu, and X. Zhu, "Mining in Anticipation for Concept Change: Proactive-Reactive Prediction in Data Streams," Data Mining and Knowledge Discovery, vol. 13, no. 3, pp. 261-289, 2006.
[37] I. Katakis, G. Tsoumakas, and I. Vlahavas, "Tracking Recurring Contexts Using Ensemble Classifiers: An Application to Email Filtering," Knowledge and Information Systems, vol. 22, pp. 371-391, 2010.
[38] J.B. Gomes, E. Menasalvas, and P.A.C. Sousa, "Tracking Recurrent Concepts Using Context," Proc. Seventh Int'l Conf. Rough Sets and Current Trends in Computing (RSCTC), pp. 168-177, 2010.
[39] I. Zliobaite, J. Bakker, and M. Pechenizkiy, "Towards Context Aware Food Sales Prediction," Proc. IEEE Int'l Conf. Data Mining (ICDM), pp. 94-99, 2009.
39 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool