The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.04 - April (2013 vol.25)
pp: 893-905
Junming Shao , Inst. for Comput. Sci., Ludwig-Maximilians-Univ. Munchen, Munich, Germany
Xiao He , Inst. for Comput. Sci., Ludwig-Maximilians-Univ. Munchen, Munich, Germany
C. Bohm , Inst. for Comput. Sci., Ludwig-Maximilians-Univ. Munchen, Munich, Germany
Qinli Yang , Inst. for Infrastruct. & Environ., Ludwig-Maximilians-Univ. Munchen, Munich, Germany
C. Plant , Dept. of Sci. Comput., Florida State Univ., Munich, Germany
ABSTRACT
Synchronization is a powerful and inherently hierarchical concept regulating a large variety of complex processes ranging from the metabolism in a cell to opinion formation in a group of individuals. Synchronization phenomena in nature have been widely investigated and models concisely describing the dynamical synchronization process have been proposed, e.g., the well-known Extensive Kuramoto Model. We explore the potential of the Extensive Kuramoto Model for data clustering. We regard each data object as a phase oscillator and simulate the dynamical behavior of the objects over time. By interaction with similar objects, the phase of an object gradually aligns with its neighborhood, resulting in a nonlinear object movement naturally driven by the local cluster structure. We demonstrate that our framework has several attractive benefits: 1) It is suitable to detect clusters of arbitrary number, shape, and data distribution, even in difficult settings with noise points and outliers. 2) Combined with the Minimum Description Length (MDL) principle, it allows partitioning and hierarchical clustering without requiring any input parameters which are difficult to estimate. 3) Synchronization faithfully captures the natural hierarchical cluster structure of the data and MDL suggests meaningful levels of abstraction. Extensive experiments demonstrate the effectiveness and efficiency of our approach.
INDEX TERMS
Synchronization, Clustering algorithms, Oscillators, Data models, Partitioning algorithms, Biological system modeling, Heuristic algorithms, Kuramoto model, Synchronization, clustering
CITATION
Junming Shao, Xiao He, C. Bohm, Qinli Yang, C. Plant, "Synchronization-Inspired Partitioning and Hierarchical Clustering", IEEE Transactions on Knowledge & Data Engineering, vol.25, no. 4, pp. 893-905, April 2013, doi:10.1109/TKDE.2012.32
REFERENCES
[1] J.A. Acebron, L.L. Bonilla, C.J.P. Vicente, F. Ritort, and R. Spigler, "The Kuramoto Model: A Simple Paradigm for Synchronization Phenomena," Rev. Modern Physics, vol. 77, no. 2, pp. 137-185, Jan. 2005.
[2] C. Böhm, C. Plant, J. Shao, and Q. Yang, "Clustering by Synchronization," Proc. Knowledge Discovery and Databases (KDD) Conf., pp. 583-592, 2010.
[3] D. Aeyels and F.D. Smet, "A Mathematical Model for the Dynamics of Clustering," Physica D: Nonlinear Phenomena, vol. 273, no. 19, pp. 2517-2530, 2008.
[4] R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan, "Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications," Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 94-105, 1998.
[5] M. Ankerst, M.M. Breunig, H.-P. Kriegel, and J. Sander, "Optics: Ordering Points to Identify the Clustering Structure," Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 49-60, 1999.
[6] A. Arenas, A. Diaz-Guilera, J. Kurths, Y. Moreno, and C.S. Zhou, "Synchronization in Complex Networks," Physics Reports, vol. 469, pp. 93-153, 2008.
[7] A. Arenas, A. Diaz-Guilera, and C.J. Perez-Vicente, "Synchronization Reveals Topological Scales in Complex Networks," Physical Rev. Letters, vol. 96, p. 114102, 2006.
[8] F. Bach and M. Jordan, "Learning Spectral Clustering," Proc. Neural Information Processing Systems (NIPS) Conf., 2004.
[9] C. Böhm, C. Faloutsos, J.-Y. Pan, and C. Plant, "Robust Information-Theoretic Clustering," Proc. 12th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 65-75, 2006.
[10] C. Böhm, C. Faloutsos, and C. Plant, "Outlier-Robust Clustering Using Independent Components," Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 185-198, 2008.
[11] D. Comaniciu and P. Meer, "Mean Shift: A Robust Approach Toward Feature Space Analysis," IEEE Trans. Pattern Analysis Machine Intelligence vol. 24, no. 5, pp. 603-619, May 2002.
[12] A.P. Dempster, N.M. Laird, and D.B. Rubin, "Maximum Likelihood from Incomplete Data via the em Algorithm," J. Royal Statistical Soc., vol. 39, no. 1, pp. 1-31, 1977.
[13] B. Dom, "An Information-Theoretic External Cluster-Validity Measure," Technical Report RJ10219, IBM, 2001.
[14] M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, "A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise," Proc. Second Int'l Conf. Knowledge Discovery and Data Mining, pp. 226-231, 1996.
[15] B.J. Frey, D. Dueck, "Clustering by Passing Messages Between Data Points," Science, vol. 315, pp. 972-976, 2007.
[16] P. Grünwald, "A Tutorial Introduction to the Minimum Description Length Principle," Advances in Minimum Description Length: Theory and Applications, MIT Press, 2005.
[17] S. Guha, R. Rastogi, and K. Shim, "CURE: An Efficient Clustering Algorithm for Large Databases," Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 73-84, 1998.
[18] G. Hamerly and C. Elkan, "Learning the k in k-means," Proc. Neural Information Processing Systems (NIPS) Conf., 2003.
[19] A.K. Jain and R.C. Dubes, Algorithms for Clustering Data. Prentice-Hall, 1988.
[20] C.S. Kim, C.S. Bae, and H.J. Tcha, "A Phase Synchronization Clustering Algorithm for Identifying Interesting Groups of Genes from Cell Cycle Expression Data," BMC Bioinformatics, vol. 9, article 56, 2008.
[21] Y. Kuramoto, "Self-Entrainment of a Population of Coupled Non-Linear Oscillators," Proc. Int'l Symp. Math. Problems in Theoretical Physics, pp. 420-422, 1975.
[22] Y. Kuramoto, Chemical Oscillations, Waves, and Turbulence. Springer-Verlag, 1984.
[23] J.B. MacQueen, "Some Methods for Classification and Analysis of Multivariate Observations," Proc. Fifth Berkeley Symp. Math. Statistics and Probability, vol. 1, pp. 281-297, 1967.
[24] F. Murtagh, "A Survey of Recent Advances in Hierarchical Clustering Algorithms," Computer J. vol. 26, no. 4, pp. 354-359, 1983.
[25] A.Y. Ng, M.I. Jordan, and Y. Weiss, "On Spectral Clustering: Analysis and an Algorithm," Proc. Advances in Neural Information Processing Systems 14, pp. 849-856, 2001.
[26] R.T. Ng and J. Han, "Efficient and Effective Clustering Methods for Spatial Data Mining," Proc. Very Large Databases (VLDB) Conf., pp. 144-155, 1994.
[27] D. Pelleg and A. Moore, "X-Means: Extending K-Means with Efficient Estimation of the Number of Clusters," Proc. Int'l Conf. Machine Learning (ICML), pp. 727-734, 2000.
[28] C. Böhm and C. Plant, "HISSCLU: A Hierarchical Density-Based Method for Semi-Supervised Clustering," Proc. 11th Int'l Conf. Extending Database Technology: Advances in Database Technology (EDBT), pp. 440-451, 2008.
[29] P. Seliger, S.C. Young, and L.S. Tsimring, "Plasticity and Learning in a Network of Coupled Phase Oscillators," Physics Rev. E, vol. 65, pp. 137-185, Jan. 2002.
[30] B. Silverman, Density Estimation for Statistics and Data Analysis. Chapman and Hall, 1986.
[31] B.J. Tenenbaum, V. Silva, and J. Langford, "A Global Geometric Framework for Nonlinear Dimensionality Reduction," Science, vol. 290, pp. 2319-2323, 2000.
[32] B.A. Turlach, "Bandwidth Selection in Kernel Density Estimation: A Review," CORE and Institut de Statistique, 1993.
[33] N.X. Vinh, J. Epps, and J. Bailey, "Information Theoretic Measures for Clusterings Comparison: Is a Correction for Chance Necessary?" Proc. Int'l Conf. Machine Learning (ICML), pp. 1073-1080, 2009.
[34] M.P. Wand and M.C. Jones, Kernel Smoothing. Chapman and Hall, 1995.
[35] T. Zhang, R. Ramakrishnan, and M. Livny, "An Efficient Data Clustering Method for Very Large Databases," Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 103-114, 1996.
40 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool