This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Semisupervised Clustering with Metric Learning using Relative Comparisons
April 2008 (vol. 20 no. 4)
pp. 496-503
Semi-supervised clustering algorithms partition a given data set using limited supervision from the user. The success of these algorithms depend on the type of supervision and also on the kind of dissimilarity measure used while creating partitions of the space. This paper proposes a clustering algorithm that uses supervision in terms of relative comparisons, viz., x is closer to y than to z. The proposed clustering algorithm simultaneously learns the underlying dissimilarity measure while finding compact clusters in the given data set using relative comparisons. Through our experimental studies on high-dimensional textual data sets, we demonstrate that the proposed algorithm achieves higher accuracy and is more robust than similar algorithms using pairwise constraints for supervision.

[1] K. Kummamuru, R. Krishnapuram, and R. Agrawal, “Learning Spatially Variant Dissimilarity (SVaD) Measures,” Proc. ACM SIGKDD '04, pp. 611-616, 2004.
[2] N. Kumar, K. Kummamuru, and D. Paranjpe, “Semi-Supervised Clustering with Metric Learning Using Relative Comparisons,” Proc. Fifth IEEE Int'l Conf. Data Mining (ICDM '05), pp. 693-696, Nov. 2005.
[3] M. Bilenko, S. Basu, and R. Mooney, “Integrating Constraints and Metric Learning in Semi-Supervised Clustering,” Proc. 21st Int'l Conf. Machine Learning (ICML '04), pp. 81-88, 2004.
[4] S. Basu, M. Bilenko, and R. Mooney, “A Probabilistic Framework for Semi-Supervised Clustering,” Proc. ACM SIGKDD '04, pp. 59-68, 2004.
[5] A.K. Jain and R.C. Dubes, Algorithms for Clustering Data. Prentice Hall, 1989.
[6] W. Pedrycz and J. Waletzky, “Fuzzy Clustering with Partial Supervision,” IEEE Trans. Systems, Man, and Cybernetics—Part B: Cybernetics, vol. 27, no. 5, Oct. 1997.
[7] D. Cohn, R. Caruana, and A. McCallum, “Semi-Supervised Clustering with User Feedback,” Technical Report TR2003-1892, Cornell Univ., 2003.
[8] E. Xing, A. Ng, M. Jordon, and S. Russell, “Distance Metric Learning, with Application to Clustering with Side-Information,” Advances in Neural Information Processing Systems, vol. 16, pp. 505-512, 2003.
[9] M. Schultz and T. Joachims, “Learning a Distance Metric with Relative Comparisons,” Advances in Neural Information Processing Systems, vol. 16, 2003.
[10] K. Kummamuru, R. Krishnapuram, and R. Agrawal, “On Learning Asymmetric Dissimilarity Measures,” Proc. Fifth IEEE Int'l Conf. Data Mining (ICDM '05), pp. 697-700, Nov. 2005.
[11] H. Frigui and O. Nasraoui, “Simultaneous Categorization of Text Documents and Identification of Cluster-Dependent Keywords,” Proc. 10th Int'l Conf. Fuzzy Systems (FUZZ-IEEE '01), pp. 158-163, 2001.
[12] J.C. Bezdek and R.J. Hathaway, “Some Notes on Alternating Optimization,” Proc. AFSS Int'l Conf. Fuzzy Systems, pp. 288-300, 2002.
[13] Y. Rui, T.S. Huang, and S. Mehrotra, “Relevance Feedback Techniques in Interactive Content-Based Image Retrieval,” Proc. SPIE Storage and Retrieval for Image and Video Databases, pp. 25-36, citeseer.nj.nec.com156156.html, 1998.
[14] http://www.cs.utexas.edu/users/ml/risccode /, 2007.
[15] http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/ theo-20/www/datanews20.tar.gz, 2007.
[16] N. Slonim and N. Tishby, “Document Clustering Using Word Clusters via the Information Bottleneck Method,” Proc. ACM SIGIR '00, pp. 208-215, 2000.
[17] J. Besag, “On the Statistical Analysis of Dirty Pictures,” J. Royal Statistical Soc., vol. 48, no. 3, pp. 259-302, 1986.
[18] Y. Zhang, M. Brady, and S. Smith, “Segmentation of Brain MR Images through a Hidden Markov Random Field Model and the Expectation-Maximization Algorithm,” IEEE Trans. Medical Imaging, vol. 20, no. 1, 2001.

Index Terms:
Semi-supervised learning, Clustering, Dissimilarity Measures, Constraint-based Clustering
Citation:
Nimit Kumar, Krishna Kummamuru, "Semisupervised Clustering with Metric Learning using Relative Comparisons," IEEE Transactions on Knowledge and Data Engineering, vol. 20, no. 4, pp. 496-503, April 2008, doi:10.1109/TKDE.2007.190715
Usage of this product signifies your acceptance of the Terms of Use.