This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Clustering over Multiple Evolving Streams by Events and Correlations
October 2007 (vol. 19 no. 10)
pp. 1349-1362
In applications of multiple data streams such as stock market trading and sensor network data analysis, the clusters of streams change at different time because of the data evolution. The information of evolving cluster is valuable to support corresponding online decisions. In this paper, we present a framework for Clustering Over Multiple Evolving sTreams by CORrelations and Events, which, abbreviated as COMETCORE, monitors the distribution of clusters over multiple data streams based on their correlation. Instead of directly clustering the multiple data streams periodically, COMET-CORE applies efficient cluster split and merge processes only when significant cluster evolution happens. Accordingly, we devise an event detection mechanism to signal the cluster adjustments. The coming streams are smoothed as sequences of end points by employing piecewise linear approximation. At the time when end points are generated, weighted correlations between streams are updated. End points are good indicators of significant change in streams, and this is a main cause of cluster evolution event. When an event occurs, through split and merge operations we can report the latest clustering results. As shown in our experimental studies, COMET-CORE can be performed effectively with good clustering quality.

[1] B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom, “Models and Issues in Data Stream Systems,” Proc. Principles of Database Systems, 2002.
[2] A. Bulut and A.K. Singh, “SWAT: Hierarchical Stream Summarization in Large Networks,” Proc. Int'l Conf. Data Eng., 2003.
[3] P. Domingos and G. Hulten, “Mining High-Speed Data Streams,” Proc. ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, 2000.
[4] M.M. Gaber, S. Krishnaswamy, and A. Zaslavsky, “Cost-Efficient Mining Techniques for Data Streams,” Proc. Australasian Workshop Data Mining and Web Intelligence, 2004.
[5] V. Ganti, J. Gehrke, and R. Ramakrishnan, “Demon: Mining and Monitoring Evolving Data,” IEEE Trans. Knowledge and Data Eng., vol. 13, no. 1, pp. 50-63, Jan./Feb. 2001.
[6] S. Guha, N. Mishra, R. Motwani, and L. O'Callaghan, “Clustering Data Streams,” Proc. Ann. Symp. Foundations of Computer Science, 2000.
[7] G. Hulten, L. Spencer, and P. Domingos, “Mining Time-Changing Data Streams,” Proc. ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, 2001.
[8] L. O'Callaghan, N. Mishra, A. Meyerson, S. Guha, and R. Motwani, “Streaming-Data Algorithms for High-Quality Clustering,” Proc. Int'l Conf. Data Eng., 2002.
[9] A. Bulut and A.K. Singh, “A Unified Framework for Monitoring Data Streams in Real Time,” Proc. Int'l Conf. Data Eng., 2005.
[10] X. Liu and H. Ferhatosmanoglu, “Efficient K-nn Search on Streaming Data Series,” Proc. Symp. Spatial and Temporal Databases, pp. 83-101, 2003.
[11] Y. Zhu and D. Shasha, “Statstream: Statistical Monitoring of Thousands of Data Streams in Real Time,” Proc. Very Large Data Bases Conf., pp. 358-369, 2002.
[12] H. Wu, B. Salzberg, and D. Zhang, “Online Event-Driven Subsequence Matching over Financial Data Streams,” Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 23-34, 2004.
[13] J. Beringer and E. Hullermeier, “Online Clustering of Parallel Data Streams,” Data and Knowledge Eng., vol. 58, no. 2, pp. 180-204, 2005.
[14] B.-R. Dai, J.-W. Huang, M.-Y. Yeh, and M.-S. Chen, “Adaptive Clustering for Multiple Evolving Streams,” IEEE Trans. Knowledge and Data Eng., vol. 18, no. 9, pp. 1166-1180, Sept. 2006.
[15] P.P. Rodrigues, J. Gama, and J.P. Pedroso, “ODAC: Hierarchical Clustering of Time Series Data Streams,” Proc. Sixth SIAM Int'l Conf. Data Mining, pp. 499-503, Apr. 2006.
[16] J. Yang, “Dynamic Clustering of Evolving Streams with a Single Pass,” Proc. Int'l Conf. Data Eng., pp. 695-697, 2003.
[17] C.C. Aggarwal, J. Han, J. Wang, and P.S. Yu, “A Framework for Clustering Evolving Data Streams,” Proc. Very Large Data Bases Conf., 2003.
[18] C.C. Aggarwal, J. Han, J. Wang, and P.S. Yu, “On High Dimensional Projected Clustering of Data Streams,” Data Mining and Knowledge Discovery, vol. 10, no. 3, pp. 251-273, 2005.
[19] F. Cao, M. Ester, W. Qian, and A. Zhou, “Density-Based Clustering over an Evolving Data Stream with Noise,” Proc. SIAM Conf. Data Mining, 2006.
[20] Z. He, X. Xu, S. Deng, and J.Z. Huang, “Clustering Categorical Data Streams,” Computational Methods in Science and Eng., 2004.
[21] E.J. Keogh, S. Chu, D. Hart, and M.J. Pazzani, “An Online Algorithm for Segmenting Time Series,” Proc. Int'l Conf. Data Mining, 2001.
[22] V. Guralnik and J. Srivastava, “Event Detection from Time Series Data,” Proc. Fifth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 33-42, 1999.
[23] D.P. Kacso, “Approximation by Means of Piecewise Linear Functions,” Results in Math., vol. 35, pp. 89-102, Jan. 1999.
[24] E.J. Keogh, “A Fast and Robust Method for Pattern Matching in Time Series Databases,” Proc. Int'l Conf. Tools with Artificial Intelligence, 1997.
[25] E.J. Keogh and M.J. Pazzani, “An Enhanced Representation of Time Series Which Allows Fast and Accurate Classification, Clustering and Relevance Feedback,” Proc. Int'l Conf. Knowledge Discovery and Data Mining, pp. 239-243, 1998.
[26] A. Franzblau, A Primer of Statistics for Non-Statisticians. Harcourt, Brace, and World, 1958.
[27] P. Rousseeuw, “Silhouettes: A Graphical Aid to the Interpretation and Validation of Cluster Analysis,” J. Computational and Applied Math., vol. 20, pp. 53-65, 1987.
[28] N. Saito, “Local Feature Extraction and Its Applications Using a Library of Bases,” PhD dissertation, Dept. of Math., Yale Univ., 1994.
[29] S. Manganaris, “Supervised Classification with Temporal Data,” PhD dissertation, Dept. of Computer Science, Vanderbilt Univ., 1997.

Index Terms:
Data mining, data clustering, data streams
Citation:
Mi-Yen Yeh, Bi-Ru Dai, Ming-Syan Chen, "Clustering over Multiple Evolving Streams by Events and Correlations," IEEE Transactions on Knowledge and Data Engineering, vol. 19, no. 10, pp. 1349-1362, Oct. 2007, doi:10.1109/TKDE.2007.1071
Usage of this product signifies your acceptance of the Terms of Use.