This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
A Human-Computer Interactive Method for Projected Clustering
April 2004 (vol. 16 no. 4)
pp. 448-460

Abstract—Clustering is a central task in data mining applications such as customer segmentation. High-dimensional data has always been a challenge for clustering algorithms because of the inherent sparsity of the points. Therefore, techniques have recently been proposed to find clusters in hidden subspaces of the data. However, since the behavior of the data can vary considerably in different subspaces, it is often difficult to define the notion of a cluster with the use of simple mathematical formalizations. The widely used practice of treating clustering as the exact problem of optimizing an arbitrarily chosen objective function can often lead to misleading results. In fact, the proper clustering definition may vary not only with the application and data set but also with the perceptions of the end user. This makes it difficult to separate the definition of the clustering problem from the perception of an end-user. In this paper, we propose a system which performs high-dimensional clustering by cooperation between the human and the computer. The complex task of cluster creation is accomplished through a combination of human intuition and the computational support provided by the computer. The result is a system which leverages the best abilities of both the human and the computer for solving the clustering problem.

[1] C.C. Aggarwal, A Human-Computer Cooperative System for Effective High Dimensional Clustering Proc. Knowledge Discovery and Data Mining Conf., pp. 221-226, 2001.
[2] C.C. Aggarwal, C. Procopiuc, J. Wolf, P.S. Yu, and J.-S. Park, Fast Algorithms for Projected Clustering Proc. ACM SIGMOD Conf., pp. 61-72, 1999.
[3] C.C. Aggarwal and P.S. Yu, Finding Generalized Projected Clusters in High Dimensional Spaces Proc. ACM SIGMOD Conf., pp. 70-81, 2000.
[4] R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan, Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications Proc. ACM SIGMOD Conf., pp. 94-105, 1998.
[5] R. Agrawal and R. Srikant, Fast Algorithms for Mining Association Rules in Large Databases Proc. Very Large Databases Conf., pp. 487-499, 1994.
[6] A.V. Aho, J. Hopcroft, and J.D. Ullman, Data Structures and Algorithms. Addison-Wesley, 1987.
[7] M. Ankerst, C. Elsen, M. Ester, and H.-P. Kriegel, Visual Classification: An Interactive Approach to Decision Tree Construction Proc. ACM Knowledge Discovery and Data Mining Conf., pp. 392-296, 1999.
[8] M. Ankerst, M. Ester, and H.-P. Kriegel, Towards an Effective Cooperation of the User and the Computer for Classification Proc. ACM Knowledge Discovery and Data Mining Conf., pp. 179-188, 2000.
[9] K. Beyer, R. Ramakrishnan, U. Shaft, and J. Goldstein, When Is Nearest Neighbor Meaningful? Proc. Int'l Conf. Database Theory, pp. 217-235, 1999.
[10] J.C. Bezdek, J. Keller, R. Krisnapuram, and N.R. Pal, Fuzzy Models and Algorithms for Pattern Recognition and Image Processing, D. Dubois and H. Prade eds., Handbooks of Fuzzy Sets Series, Kluwer Academic, 1999.
[11] K. Chakrabarti and S. Mehrotra, Local Dimensionality Reduction: A New Approach to Indexing High Dimensional Spaces Proc. Very Large Databases Conf., pp. 89-100, 2000.
[12] M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, A Density Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise Proc. ACM Knowledge Discovery and Data Mining Conf., pp. 226-231, 1996.
[13] V. Estevill-Castro, Collaborative Knowledge Acquisition with a Genetic Algorithm Proc. IEEE Int'l Conf. Tools with Artificial Intelligence, pp. 1082-3409, 1997.
[14] V. Ganti et al., "Clustering Large Datasets in Arbitrary Metric Spaces," Proc. 15th Int'l Conf. Data Eng., IEEE CS Press, Los Alamitos, Calif., 1999, pp. 502-511.
[15] S. Guha, R. Rastogi, and K. Shim, CURE: An Efficient Clustering Algorithm for Large Databases Proc. ACM SIGMOD Conf., pp. 73-84, 1998.
[16] S. Guha, R. Rastogi, and K. Shim, ROCK: A Robust Clustering Algorithm for Categorical Attributes Information Systems, vol. 25, no. 5, pp. 345-366, 2000.
[17] S. Guha, N. Mishra, R. Motwani, and L. O'Callaghan, “Clustering Data Streams,” Proc. 41st Ann. Symp. Foundations of Computer Science, 2000.
[18] J. Han, L. Lakshmanan, and R. Ng, Constraint Based Multidimensional Data Mining Computer, vol. 32, no. 8, pp. 46-50, Aug. 1999.
[19] A. Hinneburg and D.A. Keim, Optimal Grid-Clustering: Towards Breaking the Curse of Dimensionality in High-Dimensional Clustering Proc. Very Large Databases Conf., pp. 506-517, 1999.
[20] A. Hinneburg, C.C. Aggarwal, and D.A. Keim, What Is the Nearest Neighbor in High Dimensional Spaces? Proc. Very Large Databases Conf., pp. 506-515, 2000.
[21] A. Hinneburg, M. Wawryniuk, and D.A. Keim, "HD-Eye: Visual Mining of High-Dimensional Data," IEEE Computer Graphics&Applications, vol. 19, no. 5, 1999, pp. 22-31.
[22] Z. Huang, M.K. Ng, T. Lin, and D.W.-L. Cheung, An Interactive Approach to Building Classification Models by Clustering and Cluster Validation Proc. Int'l Conf. Intelligent Data Eng. and Automated Learning, pp. 23-28, 2000.
[23] Z. Huang and T. Lin, A Visual Method of Cluster Validation with Fastmap Proc. Pacific-Asia Conf. Knowledge Discovery and Data Mining, pp. 153-164, 2000.
[24] A. Jain and R. Dubes, Algorithms for Clustering Data. Prentice Hall, 1998.
[25] I.T. Jolliffe, Principal Component Analysis. Springer-Verlag, 1986.
[26] R. Motwani and P. Raghavan, Randomized Algorithms. Cambridge Univ. Press, 1995.
[27] R. Ng and J. Han, Efficient and Effective Clustering Methods for Spatial Data Mining Proc. Very Large Databases Conf., pp. 144-155, 1994.
[28] S. Sarawagi, User-Adaptive Exploration of Multidimensional Data Proc. Very Large Databases Conf., pp. 307-316, 2000.
[29] B.W. Silverman, Density Estimation for Statistics and Data Analysis. Chapman and Hall, 1986.
[30] A. Tung, J. Han, L. Lakshmanan, and R. Ng, Constraint Based Clustering in Large Databases Proc. Int'l Conf. Database Theory Conf., pp. 405-419, 2001.
[31] X. Xu et al., "A Distribution-Based Clustering Algorithm for Mining in Large Spatial Databases," Proc. 14th Int'l Conf. Data Eng., IEEE CS Press, 1998, pp. 324-331.
[32] T. Zhang, R. Ramakrishnan, and M. Livny, BIRCH: An Efficient Data Clustering Method for Very Large Databases Proc. ACM SIGMOD Conf., pp. 103-114, 1996.
[33] L. Yang, Interactive Exploration of Very Large Relational Databases Through 3D Dynamic Projections Proc. ACM Knowledge Discovery and Data Mining Conf., pp. 236-243, 2000.

Index Terms:
High-dimensional data mining, clustering, human-computer interaction.
Citation:
Charu C. Aggarwal, "A Human-Computer Interactive Method for Projected Clustering," IEEE Transactions on Knowledge and Data Engineering, vol. 16, no. 4, pp. 448-460, April 2004, doi:10.1109/TKDE.2004.1269669
Usage of this product signifies your acceptance of the Terms of Use.