The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.11 - November (2009 vol.21)
pp: 1515-1531
Xiao-Feng Wang , Hefei Institute of Intelligent Machines, Chinese Academy of Sciences, Hefei Anhui
De-Shuang Huang , Hefei Institute of Intelligent Machines, Chinese Academy of Sciences, Hefei Anhui
ABSTRACT
In this paper, a new density-based clustering framework is proposed by adopting the assumption that the cluster centers in data space can be regarded as target objects in image space. First, the level set evolution is adopted to find an approximation of cluster centers by using a new initial boundary formation scheme. Accordingly, three types of initial boundaries are defined so that each of them can evolve to approach the cluster centers in different ways. To avoid the long iteration time of level set evolution in data space, an efficient termination criterion is presented to stop the evolution process in the circumstance that no more cluster centers can be found. Then, a new effective density representation called level set density (LSD) is constructed from the evolution results. Finally, the valley seeking clustering is used to group data points into corresponding clusters based on the LSD. The experiments on some synthetic and real data sets have demonstrated the efficiency and effectiveness of the proposed clustering framework. The comparisons with DBSCAN method, OPTICS method, and valley seeking clustering method further show that the proposed framework can successfully avoid the overfitting phenomenon and solve the confusion problem of cluster boundary points and outliers.
INDEX TERMS
Density-based clustering, initial boundary, level set method, level set density, valley seeking clustering.
CITATION
Xiao-Feng Wang, De-Shuang Huang, "A Novel Density-Based Clustering Framework by Using Level Set Method", IEEE Transactions on Knowledge & Data Engineering, vol.21, no. 11, pp. 1515-1531, November 2009, doi:10.1109/TKDE.2009.21
REFERENCES
[1] J.W. Han and M. Kamber, Data Mining Concepts and Techniques, second ed. Morgan Kaufmann Publishers, 2006.
[2] A.K. Jain, M.N. Murty, and P.J. Flynn, “Data Clustering: A Review,” ACM Computing Surveys, vol. 31, no. 3, pp. 264-323, 1999.
[3] P.H.A. Sneath and R.R. Sokal, Numerical Taxonomy. Freeman, 1973.
[4] B. King, “Step-Wise Clustering Procedures,” J. Am. Statistical Assoc., vol. 69, pp. 86-101, 1967.
[5] J. MacQueen, “Some Methods for Classification and Analysis of Multivariate Observations,” Proc. Fifth Berkeley Symp. Math. Statistics and Probability, vol. 1, pp. 281-297, 1967.
[6] H. Vinod, “Integer Programming and the Theory of Grouping,” J.Am. Statistical Assoc., vol. 64, pp. 506-517, 1969.
[7] R.T. Ng and J. Han, “CLARANS: A Method for Clustering Objects for Spatial Data Mining,” IEEE Trans. Knowledge and Data Eng., vol. 14, no. 5, pp. 1003-1016, Sept./Oct. 2002.
[8] W.K. Liao, Y. Liu, and A.N. Choudhary, “A Grid-Based Clustering Algorithm Using Adaptive Mesh Refinement,” Proc. Seventh Workshop Mining Scientific and Eng. Datasets, pp. 61-69, 2004.
[9] W. Wang, J. Yang, and R. Muntz, “STING: A Statistical Information Grid Approach to Spatial Data Mining,” Proc. 23rd Conf. Very Large Databases, pp. 186-195, 1997.
[10] G. Sheikholeslami, S. Chatterjee, and A. Zhang, “WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases,” Proc. 1998 Conf. Very Large Databases, pp. 428-439, 1998.
[11] R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan, “Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications,” Proc. ACM SIGMOD '98, pp. 94-105, 1998.
[12] T.N. Tran, R. Wehrens, and L.M.C. Buydens, “KNN-Kernel Density-Based Clustering for High-Dimensional Multivariate Data,” Computational Statistics & Data Analysis, vol. 51, pp. 513-525, 2006.
[13] M. Ester, H. Kriegel, J. Sander, and X. Xu, “A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise,” Proc. Int'l Conf. Knowledge Discovery and Data Mining, pp. 226-231, 1996.
[14] J. Sander, M. Ester, H. Kriegel, and X. Xu, “Density-Based Clustering in Spatial Databases: The Algorithm GDBSCAN and Its Applications,” Data Mining and Knowledge Discovery, vol. 2, no. 2, pp. 169-194, 1998.
[15] M. Ankerst, M. Breunig, H.P. Kriegel, and J. Sander, “OPTICS: Ordering Points to Identify the Clustering Structure,” Proc. ACM SIGMOD '99, pp. 49-60, 1999.
[16] K. Fukunaga, Introduction to Statistical Pattern Recognition, second ed. Boston Academic Press, 1990.
[17] A. Hinneburg and D.A. Keim, “An Efficient Approach to Clustering in Large Multimedia Databases with Noise,” Proc. Int'l Conf. Knowledge Discovery and Data Mining, pp. 58-65, 1998.
[18] T.D. Pham, “Image Segmentation Using Probabilistic Fuzzy C-Means Clustering,” Proc. Int'l Conf. Image Processing, pp. 722-725, 2001.
[19] Y. Gdalyahu, D. Weinshall, and M. Werman, “Self Organization in Vision: Stochastic Clustering for Image Segmentation, Perceptual Grouping, and Image Database Organization,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23, no. 10, pp. 1053-1074, Oct. 2001.
[20] T. Geraud, P.Y. Strub, and J. Darbon, “Color Image Segmentation Based on Automatic Morphological Clustering,” Proc. IEEE Int'l Conf. Image Processing, pp. 70-73, 2001.
[21] Y.W. Lin and S.U. Lee, “On the Color Image Segmentation Algorithm Based on the Thresholding and the Fuzzy C-Means Techniques,” Pattern Recognition, vol. 23, no. 9, pp. 935-952, 1990.
[22] P.L. Palmer, H. Dabis, and J. Kitler, “A Performance Measure for Boundary Detection Algorithms,” Computer Vision and Image Understanding, vol. 63, no. 3, pp. 476-494, 1996.
[23] M. Sonka, V. Hlavac, and R. Boyle, Image Processing, Analysis and Machine Vision, second ed. Posts & Telecom Press, 2002.
[24] R. Adams and L. Bischof, “Seeded Region Growing,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 16, no. 6, pp. 641-647, June 1994.
[25] S.A. Hijjatoleslami and I. Kitter, “Region Growing: A New Approach,” IEEE Trans. Image Processing, vol. 7, no. 7, pp. 1079-1084, July 1998.
[26] F. Meyer and S. Beucher, “Morphology Segmentation,” J. Visual Comm. and Image Representation, vol. 1, no. 1, pp. 21-26, 1990.
[27] L. Vincent and P. Soille, “Watersheds in Digital Spaces: An Efficient Algorithm Based on Immersion Simulations,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 13, no. 6, pp.583-598, June 1991.
[28] S. Osher and J.A. Sethian, “Fronts Propagating with Curvature-Dependent Speed: Algorithms Based on Hamilton-Jacobi Formulations,” J. Computational Physics, vol. 79, no. 1, pp. 12-49, 1988.
[29] M. Kass, A. Witkin, and D. Terzopoulos, “Snakes: Active Contour Models,” Int'l J. Computer Vision, vol. 1, no. 4, pp. 321-331, 1987.
[30] Y.H. Tsai and S. Osher, “Total Variation and Level Set Based Methods in Image Science,” Acta Numerica, pp. 1-61, 2005.
[31] V. Caselles, R. Kimmel, and G. Sapiro, “Geodesic Active Contours,” Proc. IEEE Int'l Conf. Computer Vision, pp. 694-699, 1995.
[32] R. Malladi, J.A. Sethian, and B.C. Vemuri, “Shape Modelling with Front Propagation: A Level Set Approach,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 17, no. 2, pp. 158-175, Feb. 1995.
[33] J.A. Sethian, Level Set Methods and Fast Marching Methods. Cambridge Univ. Press, 1999.
[34] B.B. Kimia, A. Tannenbaum, and S. Zucker, “Shapes, Shocks, and Deformations I: The Components of Two Dimensional Shape and the Reaction-Diffusion Space,” Int'l J. Computer Vision, vol. 15, pp.189-224, 1995.
[35] N. Paragios, O. Mellina-Gottardo, and V. Ramesh, “Gradient Vector Flow Fast Geodesic Active Contours,” Proc. IEEE Int'l Conf. Computer Vision, vol. 1, pp. 67-73, 2001.
[36] Y.H. Tsai, “Rapid and Accurate Computation of the Distance Function Using Grids,” J. Computational Physics, vol. 178, no. 1, pp.175-195, 2002.
[37] E. Gokcay and J.C. Principe, “Information Theoretic Clustering,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 2, pp. 158-171, Feb. 2002.
[38] C.C. Aggarwal and P.S. Yu, “Finding Generalized Projected Clusters in High Dimensional Space,” Proc. ACM SIGMOD '00, pp.70-81, 2000.
[39] A.K.H. Tung, X. Xu, and C.B. Ooi, “CURLER: Finding and Visualizing Nonlinear Correlation Clusters,” Proc. ACM SIGMOD '05, pp. 467-478, 2005.
[40] C.C. Aggarwal and P.S. Yu, “Redefining Clustering for High-Dimensional Applications,” IEEE Trans. Knowledge and Data Eng., vol. 14, no. 2, pp. 210-225, Mar. 2002.
39 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool