Subscribe

Issue No.02 - February (2011 vol.23)

pp: 307-320

Yun Yang , The University of Manchester, Manchester

Ke Chen , The University of Manchester, Manchester

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TKDE.2010.112

ABSTRACT

Temporal data clustering provides underpinning techniques for discovering the intrinsic structure and condensing information over temporal data. In this paper, we present a temporal data clustering framework via a weighted clustering ensemble of multiple partitions produced by initial clustering analysis on different temporal data representations. In our approach, we propose a novel weighted consensus function guided by clustering validation criteria to reconcile initial partitions to candidate consensus partitions from different perspectives, and then, introduce an agreement function to further reconcile those candidate consensus partitions to a final partition. As a result, the proposed weighted clustering ensemble algorithm provides an effective enabling technique for the joint use of different representations, which cuts the information loss in a single representation and exploits various information sources underlying temporal data. In addition, our approach tends to capture the intrinsic structure of a data set, e.g., the number of clusters. Our approach has been evaluated with benchmark time series, motion trajectory, and time-series data stream clustering tasks. Simulation results demonstrate that our approach yields favorite results for a variety of temporal data clustering tasks. As our weighted cluster ensemble algorithm can combine any input partitions to generate a clustering ensemble, we also investigate its limitation by formal analysis and empirical studies.

INDEX TERMS

Temporal data clustering, clustering ensemble, different representations, weighted consensus function, model selection.

CITATION

Yun Yang, Ke Chen, "Temporal Data Clustering via Weighted Clustering Ensemble with Different Representations",

*IEEE Transactions on Knowledge & Data Engineering*, vol.23, no. 2, pp. 307-320, February 2011, doi:10.1109/TKDE.2010.112REFERENCES

- [1] J. Kleinberg, "An Impossible Theorem for Clustering,"
Advances in Neural Information Processing Systems, vol. 15, 2002.- [2] E. Keogh and S. Kasetty, "On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Study,"
Knowledge and Data Discovery, vol. 6, pp. 102-111, 2002.- [3] A. Jain, M. Murthy, and P. Flynn, "Data Clustering: A Review,"
ACM Computing Surveys, vol. 31, pp. 264-323, 1999.- [4] R. Xu and D. Wunsch,II, "Survey of Clustering Algorithms,"
IEEE Trans. Neural Networks, vol. 16, no. 3, pp. 645-678, May 2005.- [5] P. Smyth, "Probabilistic Model-Based Clustering of Multivariate and Sequential Data,"
Proc. Int'l Workshop Artificial Intelligence and Statistics, pp. 299-304, 1999.- [6] K. Murphy, "Dynamic Bayesian Networks: Representation, Inference and Learning," PhD thesis, Dept. of Computer Science, Univ. of California, Berkeley, 2002.
- [7] Y. Xiong and D. Yeung, "Mixtures of ARMA Models for Model-Based Time Series Clustering,"
Proc. IEEE Int'l Conf. Data Mining, pp. 717-720, 2002.- [8] N. Dimitova and F. Golshani, "Motion Recovery for Video Content Classification,"
ACM Trans. Information Systems, vol. 13, pp. 408-439, 1995.- [9] W. Chen and S. Chang, "Motion Trajectory Matching of Video Objects,"
Proc. SPIE/IS&T Conf. Storage and Retrieval for Media Database, 2000.- [10] C. Faloutsos, M. Ranganathan, and Y. Manolopoulos, "Fast Subsequence Matching in Time-Series Databases,"
Proc. ACM SIGMOD, pp. 419-429, 1994.- [11] E. Sahouria and A. Zakhor, "Motion Indexing of Video,"
Proc. IEEE Int'l Conf. Image Processing, vol. 2, pp. 526-529, 1997.- [12] C. Cheong, W. Lee, and N. Yahaya, "Wavelet-Based Temporal Clustering Analysis on Stock Time Series,"
Proc. Int'l Conf. Quantitative Sciences and Its Applications, 2005.- [13] E. Keogh, K. Chakrabarti, M. Pazzani, and S. Mehrota, "Locally Adaptive Dimensionality Reduction for Indexing Large Scale Time Series Databases,"
Proc. ACM SIGMOD, pp. 151-162, 2001.- [14] F. Bashir, "MotionSearch: Object Motion Trajectory-Based Video Database System—Index, Retrieval, Classification and Recognition," PhD thesis, Dept. of Electrical Eng., Univ. of Illinois, Chicago, 2005.
- [15] E. Keogh and M. Pazzani, "A Simple Dimensionality Reduction Technique for Fast Similarity Search in Large Time Series Databases,"
Proc. Pacific-Asia Conf. Knowledge Discovery and Data Mining, pp. 122-133, 2001.- [16] A. Strehl and J. Ghosh, "Cluster Ensembles—A Knowledge Reuse Framework for Combining Multiple Partitions,"
J. Machine Learning Research, vol. 3, pp. 583-617, 2002.- [17] S. Monti, P. Tamayo, J. Mesirov, and T. Golub, "Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data,"
Machine Learning, vol. 52, pp. 91-118, 2003.- [18] X. Fern and C. Brodley, "Solving Cluster Ensemble Problem by Bipartite Graph Partitioning,"
Proc. Int'l Conf. Machine Learning, pp. 36-43, 2004.- [19] A. Fred and A. Jain, "Combining Multiple Clusterings Using Evidence Accumulation,"
IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 27, no. 6 pp. 835-850, June 2005.- [20] N. Ailon, M. Charikar, and A. Newman, "Aggregating Inconsistent Information Ranking and Clustering,"
Proc. ACM Symp. Theory of Computing (STOC '05), pp. 684-693, 2005.- [21] A. Gionis, H. Mannila, and P. Tsaparas, "Clustering Aggregation,"
ACM Trans. Knowledge Discovery from Data, vol. 1, no. 1,article no. 4, Mar. 2007.- [22] V. Singh, L. Mukerjee, J. Peng, and J. Xu, "Ensemble Clustering Using Semidefinite Programming,"
Advances in Neural Information Processing Systems, pp. 1353-1360, 2007.- [23] A. Topchy, M. Law, A. Jain, and A. Fred, "Analysis of Consensus Partition in Cluster Ensemble,"
Proc. IEEE Int'l Conf. Data Mining, pp. 225-232, 2004.- [24] K. Chen, L. Wang, and H. Chi, "Methods of Combining Multiple Classifiers with Different Feature Sets and Their Applications to Text-Independent Speaker Identification,"
Int'l J. Pattern Recognition and Artificial Intelligence, vol. 11, pp. 417-445, 1997.- [25] K. Chen, "A Connectionist Method for Pattern Classification on Diverse Feature Sets,"
Pattern Recognition Letters, vol. 19, pp. 545-558, 1998.- [26] K. Chen and H. Chi, "A Method of Combining Multiple Probabilistic Classifiers through Soft Competition on Different Feature Sets,"
Neurocomputing, vol. 20, pp. 227-252, 1998.- [27] K. Chen, "On the Use of Different Speech Representations for Speaker Modeling,"
IEEE Trans. Systems, Man, and Cybernetics (Part C), vol. 35, no. 3, pp. 301-314, Aug. 2005.- [28] S. Wang and K. Chen, "Ensemble Learning with Active Data Selection for Semi-Supervised Pattern Classification,"
Proc. Int'l Joint Conf. Neural Networks, 2007.- [29] Y. Yang and K. Chen, "Combining Competitive Learning Networks on Various Representations for Temporal Data Clustering,"
Trends in Neural Computation, pp. 315-336, Springer, 2007.- [30] M. Halkidi, Y. Batistakis, and M. Varzirgiannis, "On Clustering Validation Techniques,"
J. Intelligent Information Systems, vol. 17, pp. 107-145, 2001.- [31] M. Cox, C. Eio, G. Mana, and F. Pennecchi, "The Generalized Weight Mean of Correlated Quantities,"
Metrologia, vol. 43, pp. 268-275, 2006.- [32] E. Keogh, Temporal Data Mining Benchmarks, http://www.cs. ucr.edu/~eamonntime_series_data , 2010.
- [33] CAVIAR: Context Aware Vision Using Image-Based Active Recognition, School of Informatics, The Univ. of Edinburgh, http://homepages.inf.ed.ac.uk/rbfCAVIAR, 2010.
- [34] "PDMC: Physiological Data Modeling Contest Workshop,"
Proc. Int'l Conf. Machine Learning (ICML) Workshop, http://www.cs. utexas.edu/users/sherstov pdmc/, 2004.- [35] J. Sammon,Jr., "A Nonlinear Mapping for Data Structure Analysis,"
IEEE Trans. Computers, vol. C-18, no. 5, pp 401-409, May 1969.- [36] M. Ester, H. Kriegel, J. Sander, and X. Xu, "A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise,"
Proc. Int'l Conf. Knowledge Discovery and Data Mining, pp. 226-231, 1996.- [37] M. Gavrilov, D. Anguelov, P. Indyk, and R. Motwani, "Mining the Stock Market: Which Measure Is Best?"
Proc. Int'l Conf. Knowledge Discovery and Data Mining, pp. 487-496, 2000.- [38] A. Naftel and S. Khalid, "Classifying Spatiotemporal Object Trajectories Using Unsupervised Learning in the Coefficient Feature Space,"
Multimedia Systems, vol. 12, pp. 227-238, 2006.- [39] P.P. Rodrigues, J. Gama, and J.P. Pedroso, "Hierarchical Clustering of Time-Series Data Streams,"
IEEE Trans. Knowledge and Data Eng., vol. 20, no. 5, pp. 615-627, May 2008.- [40] R. Souvenir and R. Pless, "Manifold Clustering,"
Proc. IEEE Int'l Conf. Computer Vision, pp. 648-653, 2005.- [41] M. Al-Razgan and C. Domeniconi, "Weighted Clustering Ensembles,"
Proc. SIAM Int'l Conf. Data Mining, pp. 258-269, 2006.- [42] H. Kien, A. Hua, and K. Vu, "Constrained Locally Weighted Clustering,"
Proc. ACM Int'l Conf. Very Large Data Bases (VLDB), pp. 90-101, 2008.- [43] T. Li and C. Ding, "Weighted Consensus Clustering,"
Proc. SIAM Int'l Conf. Data Mining, pp. 798-809, 2008. |