This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
CLARANS: A Method for Clustering Objects for Spatial Data Mining
September/October 2002 (vol. 14 no. 5)
pp. 1003-1016

Abstract—Spatial data mining is the discovery of interesting relationships and characteristics that may exist implicitly in spatial databases. To this end, this paper has three main contributions. First, we propose a new clustering method called CLARANS, whose aim is to identify spatial structures that may be present in the data. Experimental results indicate that, when compared with existing clustering methods, CLARANS is very efficient and effective. Second, we investigate how CLARANS can handle not only points objects, but also polygon objects efficiently. One of the methods considered, called the IR-approximation, is very efficient in clustering convex and nonconvex polygon objects. Third, building on top of CLARANS, we develop two spatial data mining algorithms that aim to discover relationships between spatial and nonspatial attributes. Both algorithms can discover knowledge that is difficult to find with existing spatial data mining algorithms.

[1] R. Aggrawal et al., "Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications," Proc. ACM SIGMOD Int'l Conf. Management of Data, ACM Press, 1998, pp. 94-105.
[2] R. Agrawal, S. Ghosh, T. Imielinski, B. Iyer, and A. Swami, “An Interval Classifier for Database Mining Applications,” Proc. 18th Conf. Very Large Databases, pp. 560–573, 1992.
[3] R. Agrawal, T. Imielinski, and A. Swami, “Mining Association Rules Between Sets of Items in Large Databases,” Proc. 1993 ACM-SIGMOD Int'l Conf. Management of Data, pp. 207-216, May 1993.
[4] M. Ankerst, M. Breunig, H.-P. Kriegel, and J. Sander, “OPTICS: Ordering Points To Identify the Clustering Structure,” Proc. 1999 ACM Special Interest Group on Management of Data, pp. 49–60, 1999.
[5] W.G. Aref and H. Samet, “Optimization Strategies for Spatial Query Processing,” Proc. 17th Conf. Very Large Databases, pp. 81-90, 1991.
[6] A. Borgida and R. J. Brachman, “Loading Data into Description Reasoners,” Proc. 1993 ACM Special Interest Group on Management of Data, pp. 217–226, 1993.
[7] P. Bradley, U. Fayyad, and C. Reina, “Scaling Clustering Algorithms to Large Databases,” Proc. Fourth Int'l Conf. Knowledge Discovery and Data Mining, pp. 9–15, 1998.
[8] T. Brinkhoff, H.-P. Kriegel, and B. Seeger, “Efficient Processing of Spatial Joins Using R-trees,” Proc. ACM SIGMOD Conf. Management of Data, 1993.
[9] D. Dobkin and D. Kirkpatrick, “A Linear Algorithm for Determining the Separation of Convex Polyhedra,” J. Algorithms, vol. 6, no. 3, pp. 381–392, 1985.
[10] M. Ester, H. Kriegel, and X. Xu, “Knowledge Discovery in Large Spatial Databases: Focusing Techniques for Efficient Class Identification,” Proc. Fourth Int'l Symp. Large Spatial Databases (SSD '95), pp. 67–82, 1995.
[11] M. Ester, H. Kriegel, J. Sander, and X. Xu, “A Density-Based Algorithm for Discovering Large Clusters in Large Spatial Databases with Noise,” Proc: Second Int'l Conf. Knowledge Discovery and Data Mining, 1996.
[12] S. Guha, R. Rastogi, and K. Shim, CURE: An Efficient Clustering Algorithm for Large Databases Proc. ACM SIGMOD, pp. 73-84, June 1998.
[13] O. Günther, “Efficient Computation of Spatial Joins,” Proc. Ninth Conf. Data Eng., pp. 50-60, 1993.
[14] J. Han, Y. Cai, and N. Cercone, “Knowledge Discovery in Databases: an Attribute-Oriented Approach,” Proc. 18th Conf. Very Large Databases, pp. 547–559, 1992.
[15] A. Hinneburg and D. A. Keim, “An Efficient Approach to Clustering in Large Multimedia Databases with Noise,” Proc. 1998 Int'l Conf. Knowledge Discovery and Data Mining, pp. 58–65, 1998.
[16] Y.E. Ioannidis and Y.C. Kang,“Randomized algorithms for optimizing large join queries,” Proc. ACM-SIGMOD Conf., vol. 19, pp. 312-321, 1990.
[17] Y.E. Ioannidis and E. Wong,“Query optimization by simulated annealing,” Proc. ACM-SIGMOD Conf., pp. 9-22, 1987.
[18] G. Karypis, E-H. Han, and V. Kumar, "Chameleon: A Hierarchical Clustering Algorithm Using Dynamic Modeling," Computer, Aug. 1999, pp. 68-75.
[19] L. Kaufman and P.J. Rousseeuw, Finding Groups in Data: an Introduction to Cluster Analysis. John Wiley&Sons, 1990.
[20] D.A. Keim, H.-P. Kriegel, and T. Seidl, “Supporting Data Mining of Large Databases by Visual Feedback Queries,” Proc. 10th Int'l Conf. Data Eng., pp. 302-313, 1994.
[21] D. Kirkpatrick and J. Snoeyink, “Tentative Prune-and-Search for Computing Fixed-Points with Applications to Geometric Computation,” Proc. Ninth ACM Symp. Computational Geometry, pp. 133–142, 1993.
[22] R. Laurini and D. Thompson, Fundamentals of Spatial Information Systems. Academic Press, 1992.
[23] W. Lu, J. Han, and B. Ooi, “Discovery of General Knowledge in Large Spatial Databases,” Proc. Far East Workshop Geographic Information Systems, pp. 275–289, 1993.
[24] G. Milligan and M. Cooper, “An Examination of Procedures for Determining the Number of Clusters in a Data Set,” Psychometrika, vol. 50, pp. 159–179, 1985.
[25] R.T. Ng and J. Han, "Efficient and Effective Clustering Methods for Spatial Data Mining," Proc. 20th Int'l Conf. Very Large Databases, Morgan Kaufmann, 1994, pp. 144-155.
[26] G. Piatetsky-Shapiro and W.J. Frawley, Knowledge Discovery in Databases. AAAI/MIT Press, 1991.
[27] F.P. Preparata and M.I. Shamos, Computational Geometry. Springer-Verlag, 1985.
[28] H. Samet, The Design and Analysis of Spatial Data Structures. Addison-Wesley, 1990.
[29] G. Sheikholeslami, S. Chatterjee, and A. Zhang, WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases Proc. Very Large Date Bases Conf., pp. 428-439, Aug. 1998.
[30] H. Spath, Cluster Dissection and Analysis: Theory, FORTRAN programs, Examples. Ellis Horwood Ltd., 1985.
[31] W. Wang, J. Yang, and R.R. Muntz, "Sting: A Statistical Information Grid Approach to Spatial Data Mining," Proc. 23rd Int'l Conf. Very Large Databases, Morgan Kaufmann, 1997, pp. 186-195.
[32] Y. Yu, “Finding Strong, Common and Discriminating Characteristics of Clusters from Thematic Maps,” MSc Thesis, Dept. of Computer Science, Univ. of British Columbia, 1996.
[33] T. Zhang, R. Ramakrishnan, and M. Livny, "Birch: An Efficient Data Clustering Method for Very Large Databases," Proc. ACM SIGMOD Int'l Conf. Management of Data, ACM Press, 1996, pp. 103-114.

Index Terms:
Spatial data mining, clustering algorithms, randomized search, computational geometry.
Citation:
Raymond T. Ng, Jiawei Han, "CLARANS: A Method for Clustering Objects for Spatial Data Mining," IEEE Transactions on Knowledge and Data Engineering, vol. 14, no. 5, pp. 1003-1016, Sept.-Oct. 2002, doi:10.1109/TKDE.2002.1033770
Usage of this product signifies your acceptance of the Terms of Use.