The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.05 - May (2013 vol.25)
pp: 1175-1180
Ludmila I. Kuncheva , University of Bangor, Bangor
ABSTRACT
Change detection in streaming data relies on a fast estimation of the probability that the data in two consecutive windows come from different distributions. Choosing the criterion is one of the multitude of questions that need to be addressed when designing a change detection procedure. This paper gives a log-likelihood justification for two well-known criteria for detecting change in streaming multidimensional data: Kullback-Leibler (K-L) distance and Hotelling's T-square test for equal means (H). We propose a semiparametric log-likelihood criterion (SPLL) for change detection. Compared to the existing log-likelihood change detectors, SPLL trades some theoretical rigor for computation simplicity. We examine SPLL together with K-L and H on detecting induced change on 30 real data sets. The criteria were compared using the area under the respective Receiver Operating Characteristic (ROC) curve (AUC). SPLL was found to be on the par with H and better than K-L for the nonnormalized data, and better than both on the normalized data.
INDEX TERMS
Detectors, Approximation methods, Covariance matrix, Kernel, Upper bound, Arrays, Monte Carlo methods, log-likelihood detector, Change detection, multidimensional data streams, Hotelling's T-square
CITATION
Ludmila I. Kuncheva, "Change Detection in Streaming Multivariate Data Using Likelihood Detectors", IEEE Transactions on Knowledge & Data Engineering, vol.25, no. 5, pp. 1175-1180, May 2013, doi:10.1109/TKDE.2011.226
REFERENCES
[1] C.C. Aggarwal, Data Streams: Models and Algorithms. Springer, 2007.
[2] M. Basseville and I.V. Nikiforov, Detection of Abrupt Changes - Theory and Application. Prentice-Hall, Inc., 1993.
[3] E.S. Page, "Continuous Inspection Schemes," Biometrika, vol. 41, pp. 100-114, 1954.
[4] M.R. ReynoldsJr. and Z.G. Stoumbos, "The SPRT Chart for Monitoring a Proportion," IIE Trans., vol. 30, pp. 545-561, 1998.
[5] M.R. ReynoldsJr. and Z.G. Stoumbos, "A General Approach to Modeling CUSUM Charts for a Proportion," IIE Trans., vol. 32, pp. 515-535, 2000.
[6] R.P. Adams and D.J.C. MacKay, "Bayesian Online Changepoint Detection," technical report, Univ. of Cambridge, Cambridge, UK, 2007.
[7] D. Kifer, S. Ben-David, and J. Gehrke, "Detecting Change in Data Streams," Proc. 30th Int'l Conf. Very Large Data Bases (VLDB '04), 2004.
[8] S.-S. Ho, "A Martingale Framework for Concept Change Detection in Time-Varying Data Streams," Proc. 22nd Int'l Conf. Machine Learning (ICML), pp. 321-327, 2005.
[9] A. Bifet and R. Gavaldà, "Learning from Time-Changing Data with Adaptive Windowing," Proc. Seventh SIAM Int'l Conf. Data Mining, pp. 443-448, 2007.
[10] X. Song, M. Wu, C. Jermaine, and S. Ranka, "Statistical Change Detection for Multi-Dimensional Data," KDD '07: Proc. 13th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 667-676, 2007.
[11] J. Gama, P. Medas, G. Castillo, and P. Rodrigues, "Learning with Drift Detection," Proc. 17th Brazilian Symp. Artificial Intelligence Advances in Artificial Intelligence (SBIA '04), pp. 286-295, 2004.
[12] V. Chandola, A. Banerjee, and V. Kumar, "Anomaly Detection: A Survey," ACM Computing Surveys, vol. 41, no. 3,article 15, 2009.
[13] M. Ye, X. Li, and M.E. Orlowska, "Projected Outlier Detection in High-Dimensional Mixed-Attributes Data Set," Expert Systems with Applications, vol. 36, no. 3, pp. 7104-7113, 2009.
[14] G. Widmer and M. Kubat, "Learning in the Presence of Concept Drift and Hidden Contexts," Machine Learning, vol. 23, pp. 69-101, 1996.
[15] I. Koychev and R. Lothian, "Tracking Drifting Concepts by Time Window Optimisation," Proc. 25th SGAI Int'l Conf. Innovative Techniques and Applications of Artificial Intelligence (AI '05), pp. 46-59, 2005.
[16] T. Dasu, S. Krishnan, S. Venkatasubramanian, and K. Yi, "An Information-Theoretic Approach to Detecting Changes in Multi-Dimensional Data Streams," Proc. 38th Symp. Interface of Statistics, Computing Science, and Applications (Interface '06), 2006.
[17] M. Severo and J. Gama, "Change Detection with Kalman Filter and CUSUM," Proc. Int'l Conf. Discovery Science, pp. 243-254, 2006.
[18] H. Hotelling, "The Generalization of Student's Ratio," Annals of Math. Statistics, vol. 2, no. 3, pp. 360-378, 1931.
[19] B.S. Everitt, A Handbook of Statistical Analyses Using S-Plus, second ed. CRC Press, 2001.
[20] A. Asuncion and D. Newman, "UCI Machine Learning Repository," http://www.ics.uci.edu/mlearnMLRepository.html , 2007.
[21] T. Fawcett, "ROC Graphs: Notes and Practical Considerations for Researchers," Technical Report HPL-2003-4, HP Labs, Palo Alto, http://www.hpl.hp.com/techreports/2003HPL-2003-4.pdf , 2003.
20 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool