Issue No.10 - October (2007 vol.19)
In applications of multiple data streams such as stock market trading and sensor network data analysis, the clusters of streams change at different time because of the data evolution. The information of evolving cluster is valuable to support corresponding online decisions. In this paper, we present a framework for Clustering Over Multiple Evolving sTreams by CORrelations and Events, which, abbreviated as COMETCORE, monitors the distribution of clusters over multiple data streams based on their correlation. Instead of directly clustering the multiple data streams periodically, COMET-CORE applies efficient cluster split and merge processes only when significant cluster evolution happens. Accordingly, we devise an event detection mechanism to signal the cluster adjustments. The coming streams are smoothed as sequences of end points by employing piecewise linear approximation. At the time when end points are generated, weighted correlations between streams are updated. End points are good indicators of significant change in streams, and this is a main cause of cluster evolution event. When an event occurs, through split and merge operations we can report the latest clustering results. As shown in our experimental studies, COMET-CORE can be performed effectively with good clustering quality.
Data mining, data clustering, data streams
Mi-Yen Yeh, Bi-Ru Dai, Ming-Syan Chen, "Clustering over Multiple Evolving Streams by Events and Correlations", IEEE Transactions on Knowledge & Data Engineering, vol.19, no. 10, pp. 1349-1362, October 2007, doi:10.1109/TKDE.2007.1071