loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
A Shrinking-Based Clustering Approach for Multidimensional Data
October 2005 (vol. 17 no. 10)
pp. 1389-1403
Existing data analysis techniques have difficulty in handling multidimensional data. Multidimensional data has been a challenge for data analysis because of the inherent sparsity of the points. In this paper, we first present a novel data preprocessing technique called shrinking which optimizes the inherent characteristic of distribution of data. This data reorganization concept can be applied in many fields such as pattern recognition, data clustering, and signal processing. Then, as an important application of the data shrinking preprocessing, we propose a shrinking-based approach for multidimensional data analysis which consists of three steps: data shrinking, cluster detection, and cluster evaluation and selection. The process of data shrinking moves data points along the direction of the density gradient, thus generating condensed, widely-separated clusters. Following data shrinking, clusters are detected by finding the connected components of dense cells (and evaluated by their compactness). The data-shrinking and cluster-detection steps are conducted on a sequence of grids with different cell sizes. The clusters detected at these scales are compared by a cluster-wise evaluation measurement, and the best clusters are selected as the final result. The experimental results show that this approach can effectively and efficiently detect clusters in both low and high-dimensional spaces.

[1] 1389 C. Aggarwal , C. Procopiuc , J. Wolf , P. Yu , and J. Park , “Fast Algorithms for Projected Clustering,” Proc. ACM SIGMOD Conf. Management of Data, pp. 61-72, 1999. [2] R. Agrawal , J. Gehrke , D. Gunopulos , and P. Raghavan , “Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications,” Proc. ACM SIGMOD Conf. Management of Data, pp. 94-105, 1998. [3] N. Ahuja , “Dot Pattern Processing Using Voronoi Neighborhoods,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 4, no. 3, pp. 336-343, May 1982.[4] M. Ankerst , M.M. Breunig , H.-P. Kriegel , and J. Sander , “OPTICS: Ordering Points To Identify the Clustering Structure,” Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '99), pp. 49-60, 1999.[5] S.D. Bay The UCI KDD Archive, http:/kdd.ics.uci.edu, Univ. of California, Irvine, Dept. of Information and Computer Science, 2004.[6] C.-F. Chen and J.-M. Lee , “The Validity Measurement of Fuzzy C-Means Classifier for Remotely Sensed Images,” Proc. ACRS 2001— 22nd Asian Conf. Remote Sensing, 2001.[7] D. Comaniciu and P. Meer , “Mean Shift: A Robust Approach toward Feature Space Analysis,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 5, pp. 603-619, 2002. [8] T.H. Cormen , C.E. Leiserson , and R.L. Rivest , Introduction to Algorithms. The MIT Press, 1990.[9] D.W. Scott , Multivariate Density Estimation: Theory, Practice, and Visualization. 1992.[10] M. Ester , K. H.-P , J. Sander , and X. Xu , “A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise,” Proc. Second Int'l Conf. Knowledge Discovery and Data Mining, 1996.[11] U.M. Fayyad , G. Piatetsky-Shapiro , P. Smyth , and R. Uthurusamy , Advances in Knowledge Discovery and Data Mining. AAAI Press, 1996.[12] S. Guha , R. Rastogi , and K. Shim , “Cure: An Efficient Clustering Algorithm for Large Databases,” Proc. ACM SIGMOD Conf. Management of Data, pp. 73-84, 1998.[13] S. Guha , R. Rastogi , and K. Shim , “Rock: A Robust Clustering Algorithm for Categorical Attributes,” Proc. IEEE Conf. Data Eng., 1999.[14] A. Hinneburg and D.A. Keim , “An Efficient Approach to Clustering in Large Multimedia Databases with Noise,” Proc. Fourth Int'l Conf. Knowledge Discovery and Data Mining, pp. 58-65, Aug. 1998.[15] J. MacQueen , “Some Methods for Classification and Analysis of Multivariate Observations,” Proc. Fifth Berkeley Symp. Math. Statistics and Probability, vol. I, Statistics, 1967.[16] A. Jain , M. Murty , and P. Flyn , “Data Clustering: A Review,” ACM Computing Surveys, vol. 31, no. 3, 1999.[17] J. Han and M. Kamber , Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, 2001.[18] K. Fukunaga , Introduction to Statistical Pattern Recognition. 1990.[19] L. Kaufman and P.J. Rousseeuw , Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley & Sons, 1990.[20] J. Kleinberg , “An Impossibility Theorem for Clustering,” Advances in Neural Information Processing Systems (NIPS) 15, 2002.[21] M. Halkidi and M. Vazirgiannis , “A Data Set Oriented Approach for Clustering Algorithm Selection,” Proc. Fifth European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD '01), 2001.[22] M. Halkidi and M. Vazirgiannis , “Clustering Validity Assessment: Finding the Optimal Partitioning of a Data Set,” Proc. Int'l Conf. Data Mining, 2001.[23] M. Halkidi , Y. Batistakis , and M. Vazirgiannis , “Clustering Algorithms and Validity Measures,” Proc. Scientific and Statistical Database Management Conf., 2001.[24] R.T. Ng and J. Han , “Efficient and Effective Clustering Methods for Spatial Data Mining,” Proc. 20th Very Large Data Bases Conf., pp. 144-155, 1994.[25] S. Papadimitriou , H. Kitagawa , P.B. Gibbons , and C. Faloutsos , “Loci: Fast Outlier Detection Using the Local Correlation Integral,” Proc. Int'l Conf. Data Eng., 2003.[26] T. Seidl and H. Kriegel , “Optimal Multi-Step k-Nearest Neighbor Search,” Proc. ACM SIGMOD Conf. Management of Data, pp. 154-164, 1998. [27] G. Sheikholeslami , S. Chatterjee , and A. Zhang , “Wavecluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases,” Proc. 24th Int'l Conf. Very Large Data Bases, 1998.[28] J.H. Wang and J.D. Rau , “Vq-Agglomeration: A Novel Approach to Clustering,” IEE Proc. Vision, Image, and Signal Processing, vol. 148, no. 1, pp. 36-44, 2001.[29] W. Wang , J. Yang , and R. Muntz , “STING: A Statistical Information Grid Approach to Spatial Data Mining,” Proc. 23rd Very Large Data Bases Conf., pp. 186-195, 1997.[30] W. Viessman Jr. and G.L. Lewis , Introduction to Hydrology, 4/e. Prentice Hall, 1996.[31] Y. Shi , Y. Song , and A. Zhang , “A Shrinking-Based Approach for Multi-Dimensional Data Analysis,” Proc. 29th Very Larage Data Bases Conf., Sept. 2003.[32] T. Zhang , R. Ramakrishnan , and M. Livny , “BIRCH: An Efficient Data Clustering Method for Very Large Databases,” Proc. 1996 ACM SIGMOD Int'l Conf. Management of Data, pp. 103-114, 1996.

Index Terms:
Index Terms- Data shrinking preprocessing, clustering, cluster evaluation.
Citation:
Yong Shi, Yuqing Song, Aidong Zhang, "A Shrinking-Based Clustering Approach for Multidimensional Data," IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 10, pp. 1389-1403, Oct. 2005, doi:10.1109/TKDE.2005.157
Usage of this product signifies your acceptance of the Terms of Use.