This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Dual Clustering: Integrating Data Clustering over Optimization and Constraint Domains
May 2005 (vol. 17 no. 5)
pp. 628-637
Spatial clustering has attracted a lot of research attention due to its various applications. In most conventional clustering problems, the similarity measurement mainly takes the geometric attributes into consideration. However, in many real applications, the nongeometric attributes are what users are concerned about. In the conventional spatial clustering, the input data set is partitioned into several compact regions and data points which are similar to one another in their nongeometric attributes may be scattered over different regions, thus making the corresponding objective difficult to achieve. To remedy this, we propose and explore in this paper a new clustering problem on two domains, called dual clustering, where one domain refers to the optimization domain and the other refers to the constraint domain. Attributes on the optimization domain are those involved in the optimization of the objective function, while those on the constraint domain specify the application dependent constraints. Our goal is to optimize the objective function in the optimization domain while satisfying the constraint specified in the constraint domain. We devise an efficient and effective algorithm, named Interlaced Clustering-Classification, abbreviated as ICC, to solve this problem. The proposed ICC algorithm combines the information in both domains and iteratively performs a clustering algorithm on the optimization domain and also a classification algorithm on the constraint domain to reach the target clustering effectively. The time and space complexities of the ICC algorithm are formally analyzed. Several experiments are conducted to provide the insights into the dual clustering problem and the proposed algorithm.

[1] A. Ben-Hur, D. Horn, H.T. Siegelmann, and V. Vapnik, “Support Vector Clustering,” J. Machine Learning Research, vol. 2, pp. 125-137, 2001.
[2] J.C. Bezdek, Pattern Recognition with Fuzzy Objective Function Algorithms. New York: Plenum Press, 1981.
[3] P.S. Bradley, K.P. Bennett, and A. Demiriz, “Constrained K-Means Clustering,” Technical Report MSR-TR-2000-65, Microsoft Research, May 2000.
[4] A.G. Buchner and M. Mulvenna, “Discovery Internet Marketing Intelligence through Online Analytical Web Usage Mining,” ACM SIGMOD Record, vol. 27, no. 4, pp. 54-61, Dec. 1998.
[5] C.-C. Chang, C.-W. Hsu, and C.-J. Lin, “The Analysis of Decomposition Methods for Support Vector Machines,” IEEE Trans. Neural Networks, pp. 1003-1008, 2000.
[6] M.-S. Chen, J. Han, and P.S. Yu, “Data Mining: An Overview from Database Perspective,” IEEE Trans. Knowledge and Data Eng., vol. 5, no. 1, pp. 866-883, Dec. 1996.
[7] B.-R. Dai, C.-R. Lin, and M.-S. Chen, “On the Techniques for Data Clustering with Numerical Constraints,” Proc. SIAM Int'l Conf. Data Mining (SDM '03), 2003.
[8] R.C. Dubes, “How Many Clusters Are Best?— An Experiment,” Pattern Recognition, vol. 20, no. 6, pp. 645-663, 1987.
[9] V. Estivill-Castro and I. Lee, “Autoclust+: Automatic Clustering of Point-Data Sets in the Presence of Obstacles,” Proc. Int'l Workshop on Temporal, Spatial and Spatio-Temporal Data Mining (TSDM '00), pp. 133-146, 2000.
[10] U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurasamy, Advances in Knowledge Discovery and Data Mining. Cambridge, Mass: MIT Press, 1996.
[11] T. Gaertner, J.W. Lloyd, and P.A. Flach, “Kernels for Structured Data,” Proc. Int'l Conf. Inductive Logic Programming (ILP '02), July 2002.
[12] J. Han and M. Kamber, Data Mining: Concepts and Techniques. Morgan Kaufmann, 2000.
[13] J. Hertz, A. Krogh, and R.G. Palmer, Introduction to the Theory of Neural Computation. Reading, Mass.: Addison-Wesley, 1991.
[14] C.-W. Hsu and C.-J. Lin, “A Comparison of Methods for Multi-class Support Vector Machines,” IEEE Trans. Neural Networks, pp. 415-425, 2002.
[15] D. Hush and C. Scovel, “Polynomial-Time Decomposition Algorithms for Support Vector Machines,” Machine Learning, pp. 51-71, 2003.
[16] B. King, “Step-Wise Clustering Procedures,” J. Am. Statistical Assoc., vol. 69, pp. 86-101, 1967.
[17] C.-R. Lin and M.-S. Chen, “On the Optimal Clustering of Sequential Data,” Proc. Second SIAM Int'l Conf. Data Mining, Apr. 2002.
[18] C.-R. Lin and M.-S. Chen, “A Robust and Efficient Clustering Algorithm Based on Cohesion Self-Merging,” Proc. Eighth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, Aug. 2002.
[19] S.Y. Lu and K.S. Fu, “A Sentence-to-Sentence Clustering Procedure for Pattern Analysis,” IEEE Trans. Systems, Man, and Cybernetics, vol. 8, pp. 381-389, 1978.
[20] J. McQueen, “Some Methods for Classification and Analysis of Multivariate Observations,” Proc. Fifth Berkeley Symp. Math. Statistics and Probability, 1967.
[21] K. Rose, E. Gurewitz, and G. Fox, “Deterministic Annealing, Constrained Clustering, and Optimization,” Proc. IEEE Int'l Joint Conf. Neural Networks, pp. 2515-2520, 1991.
[22] K. Rose, E. Gurewitz, and G. Fox, “Constrained Clustering as an Optimization Method,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 15, no. 8, pp. 785-794, Aug. 1993.
[23] P.H.A. Sneath and R.R. Sokal, Numerical Taxonomy. London: Freeman, 1973.
[24] A.K.H. Tung, J. Han, L.V.S. Lakshmanan, and R.T. Ng, “Constraint-Based Clustering in Large Databases,” Proc. 2001 Int'l Conf. Database Theory, Jan. 2001.
[25] A.K.H. Tung, J. Hou, and J. Han, “Spatial Clustering in the Presence of Obstacles,” Proc. Int'l Conf. Data Eng. (ICDE), 2001.
[26] V. Vapnik, The Nature of Statistical Learning Theory. Springer Verlag, 1995.
[27] V. Vapnik, Statistical Learning Theory. Wiley, 1998.
[28] C. Yu, B.-C. Ooi, K.-L. Tan, and H.V. Jagadish, “Indexing the Distance: An Efficient Method to KNN Processing,” The VLDB J., pp. 421-430, 2001.
[29] O.R. Zaiane, A. Foss, C.-H. Lee, and W. Wang, “On Data Clustering Analysis: Scalability, Constraints, and Validation,” Proc. Pacific-Asia Conf. Knowledge Discovery and Data Mining (PAKDD '02), pp. 28-39, 2002.

Index Terms:
Data mining, data clustering, dual clustering.
Citation:
Cheng-Ru Lin, Ken-Hao Liu, Ming-Syan Chen, "Dual Clustering: Integrating Data Clustering over Optimization and Constraint Domains," IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 5, pp. 628-637, May 2005, doi:10.1109/TKDE.2005.75
Usage of this product signifies your acceptance of the Terms of Use.