Issue No.05 - May (2009 vol.21)
Hung-Leng Chen , National Taiwan University, Taipei
Ming-Syan Chen , National Taiwan University, Taipei
Su-Chen Lin , National Taiwan University, Taipei
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TKDE.2008.192
Although the problem of clustering numerical time-evolving data is well-explored, the problem of clustering categorical time-evolving data remains as a challenge issue. In this paper, we propose a generalized clustering framework which utilizes existing clustering algorithms and adopts sliding window technique to detect if there is a drifting-concept or not in the incoming sliding window. The framework is composed of two algorithms: Drifting Concept Detecting (abbreviated as DCD) algorithm detecting the changes of cluster distributions between the current sliding window and the last clustering result, and Cluster Relationship Analysis (abbreviated as CRA) algorithm analyzing the relationship between clustering results at different time. In DCD, the concept is said to drift if quite a large number of outliers are found in the current sliding window, or if quite a large number of clusters are varied in the ratio of data points. The drifted sliding window will perform re-clustering to capture the recent concept. In CRA, a visualizing method is devised to facilitate the observation of the evolving clustering results. The framework is validated on real and synthetic data sets, and is shown to not only accurately detect the drifting-concepts but also attain clustering results of better quality.
Clustering, classification, and association rules, Data mining, Mining methods and algorithms
Hung-Leng Chen, Ming-Syan Chen, Su-Chen Lin, "Catching the Trend: A Framework for Clustering Concept-Drifting Categorical Data", IEEE Transactions on Knowledge & Data Engineering, vol.21, no. 5, pp. 652-665, May 2009, doi:10.1109/TKDE.2008.192