This Article 
 Bibliographic References 
 Add to: 
Finding Aggregate Proximity Relationships and Commonalities in Spatial Data Mining
December 1996 (vol. 8 no. 6)
pp. 884-897

Abstract—In this paper, we study two spatial knowledge discovery problems involving proximity relationships between clusters and features. The first problem is: Given a cluster of points, how can we efficiently find features (represented as polygons) that are closest to the majority of points in the cluster? We measure proximity in an aggregate sense due to the nonuniform distribution of points in a cluster (e.g., houses on a map), and the different shapes and sizes of features (e.g., natural or man-made geographic features). The second problem is: Given n clusters of points, how can we extract the aggregate proximity commonalities (i.e., features) that apply to most, if not all, of the n clusters? Regarding the first problem, the main contribution of the paper is the development of Algorithm CRH which uses geometric approximations (i.e., circles, rectangles, and convex hulls) to filter and select features. Highly scalable and incremental, Algorithm CRH can examine over 50,000 features and their spatial relationships with a given cluster in approximately one second of CPU time. Regarding the second problem, the key contribution is the development of Algorithm GenCom that makes use of concept generalization to effectively derive many meaningful commonalities that cannot be found otherwise.

[1] R. Agrawal, S. Ghosh, T. Imielinski, B. Iyer, and A. Swami, “An Interval Classifier for Database Mining Applications,” Proc. 18th Conf. Very Large Databases, pp. 560–573, 1992.
[2] R. Agrawal, T. Imielinski, and A. Swami, “Mining Association Rules Between Sets of Items in Large Databases,” Proc. 1993 ACM-SIGMOD Int'l Conf. Management of Data, pp. 207-216, May 1993.
[3] F. Aurenhammer, "Voronoi Diagrams: A Survey of a Fundamental Geometric Data Structure," ACM Computing Surveys, vol. 23, no. 3, 1991, pp. 345-405.
[4] N. Beckmann, H.-P. Kriegel, R. Schneider, and B. Seeger, “The R*-Tree: An Efficient and Robust Access Method for Points and Rectangles,” Proc. ACM SIGMOD Conf. Management of Data, 1990.
[5] H. Edelsbrunner, D.G. Kirkpatrick, and R. Seidel, "On the Shape of a Set of Points in the Plane," IEEE Trans. Information Theory, vol. 29, no. 4, pp. 551-559, July 1983.
[6] C. Faloutsos, T. Sellis, and N. Roussopoulos, “Analysis of Object Oriented Spatial Access Methods,” Proc. ACM SIGMOD Conf. Management of Data, 1987.
[7] A. Guttman, “R-Trees: A Dynamic Index Structure for Spatial Searching,” Proc. ACM SIGMOD Conf. Management of Data, 1984.
[8] J. Han, Y. Cai, and N. Cercone, “Knowledge Discovery in Databases: an Attribute-Oriented Approach,” Proc. 18th Conf. Very Large Databases, pp. 547–559, 1992.
[9] D.A. Keim, H.-P. Kriegel, and T. Seidl, “Supporting Data Mining of Large Databases by Visual Feedback Queries,” Proc. 10th Int'l Conf. Data Eng., pp. 302-313, 1994.
[10] E.M. Knorr, Efficiently Determining Aggregate Proximity Relationships in Spatial Data Mining, MSc thesis, Dept. of Computer Science, Univ. of British Columbia, 1995.
[11] E.M. Knorr and R.T. Ng, "Extraction of Spatial Proximity Pat-terns by Concept Generalization," Proc. Second KDD, pp. 347-350, Aug. 1996.
[12] W. Lu, J. Han, and B.C. Ooi, "Discovery of General Knowledge in Large Spatial Databases," Proc. Far East Workshop on Geographic Information Systems,Singapore, pp. 275-289, 1993.
[13] A. Melkman, "On-Line Construction of the Convex Hull of a Simple Polyline," Information Processing Letters, vol. 25, pp. 11-12, 1987.
[14] R.T. Ng and J. Han, "Efficient and Effective Clustering Methods for Spatial Data Mining," Proc. 20th Int'l Conf. Very Large Databases, Morgan Kaufmann, 1994, pp. 144-155.
[15] A. Okabe, B. Boots, and K. Sugihara, "Nearest Neighbourhood Operations with Generalized Voronoi Diagrams: A Review," Univ. of Tokyo, Dept. of Urban Engineering, Discussion Paper Series, no. 51, Sept. 1992.
[16] J. O'Rourke, Computational Geometry in C. Cambridge Univ. Press, 1993.
[17] F.P. Preparata and M.I. Shamos, Computational Geometry. Springer-Verlag, 1985.
[18] G. Salton and M. McGill, Introduction to Modern Information Retrieval, McGraw Hill, New York, 1983.
[19] H. Samet, The Design and Analysis of Spatial Data Structures. Addison-Wesley, 1990.
[20] T. Sellis, N. Roussopoulos, and C. Faloutsos, “The R+-Tree: A Dynamic Index for Multidimensional Objects,” Proc. 13th Int'l Conf. Very Large Data Bases (VLDB), 1987.
[21] D. Shasha, T.-L. Wang, “New Techniques for Best-Match Retrieval,” ACM Trans. Information Systems, vol. 8, no. 2, pp. 140-158, Apr. 1990.

Index Terms:
Spatial knowledge discovery, concept generalization, proximity relationships, geometric filtering, GIS.
Edwin M. Knorr, Raymond T. Ng, "Finding Aggregate Proximity Relationships and Commonalities in Spatial Data Mining," IEEE Transactions on Knowledge and Data Engineering, vol. 8, no. 6, pp. 884-897, Dec. 1996, doi:10.1109/69.553156
Usage of this product signifies your acceptance of the Terms of Use.