Subscribe

Issue No.04 - April (2008 vol.20)

pp: 496-503

ABSTRACT

Semi-supervised clustering algorithms partition a given data set using limited supervision from the user. The success of these algorithms depend on the type of supervision and also on the kind of dissimilarity measure used while creating partitions of the space. This paper proposes a clustering algorithm that uses supervision in terms of relative comparisons, viz., x is closer to y than to z. The proposed clustering algorithm simultaneously learns the underlying dissimilarity measure while finding compact clusters in the given data set using relative comparisons. Through our experimental studies on high-dimensional textual data sets, we demonstrate that the proposed algorithm achieves higher accuracy and is more robust than similar algorithms using pairwise constraints for supervision.

INDEX TERMS

Semi-supervised learning, Clustering, Dissimilarity Measures, Constraint-based Clustering

CITATION

Nimit Kumar, Krishna Kummamuru, "Semisupervised Clustering with Metric Learning using Relative Comparisons",

*IEEE Transactions on Knowledge & Data Engineering*, vol.20, no. 4, pp. 496-503, April 2008, doi:10.1109/TKDE.2007.190715REFERENCES

- [1] K. Kummamuru, R. Krishnapuram, and R. Agrawal, “Learning Spatially Variant Dissimilarity (SVaD) Measures,”
Proc. ACM SIGKDD '04, pp. 611-616, 2004.- [3] M. Bilenko, S. Basu, and R. Mooney, “Integrating Constraints and Metric Learning in Semi-Supervised Clustering,”
Proc. 21st Int'l Conf. Machine Learning (ICML '04), pp. 81-88, 2004.- [4] S. Basu, M. Bilenko, and R. Mooney, “A Probabilistic Framework for Semi-Supervised Clustering,”
Proc. ACM SIGKDD '04, pp. 59-68, 2004.- [5] A.K. Jain and R.C. Dubes,
Algorithms for Clustering Data. Prentice Hall, 1989.- [6] W. Pedrycz and J. Waletzky, “Fuzzy Clustering with Partial Supervision,”
IEEE Trans. Systems, Man, and Cybernetics—Part B: Cybernetics, vol. 27, no. 5, Oct. 1997.- [7] D. Cohn, R. Caruana, and A. McCallum, “Semi-Supervised Clustering with User Feedback,” Technical Report TR2003-1892, Cornell Univ., 2003.
- [8] E. Xing, A. Ng, M. Jordon, and S. Russell, “Distance Metric Learning, with Application to Clustering with Side-Information,”
Advances in Neural Information Processing Systems, vol. 16, pp. 505-512, 2003.- [9] M. Schultz and T. Joachims, “Learning a Distance Metric with Relative Comparisons,”
Advances in Neural Information Processing Systems, vol. 16, 2003.- [11] H. Frigui and O. Nasraoui, “Simultaneous Categorization of Text Documents and Identification of Cluster-Dependent Keywords,”
Proc. 10th Int'l Conf. Fuzzy Systems (FUZZ-IEEE '01), pp. 158-163, 2001.- [12] J.C. Bezdek and R.J. Hathaway, “Some Notes on Alternating Optimization,”
Proc. AFSS Int'l Conf. Fuzzy Systems, pp. 288-300, 2002.- [13] Y. Rui, T.S. Huang, and S. Mehrotra, “Relevance Feedback Techniques in Interactive Content-Based Image Retrieval,”
Proc. SPIE Storage and Retrieval for Image and Video Databases, pp. 25-36, citeseer.nj.nec.com156156.html, 1998.- [14] http://www.cs.utexas.edu/users/ml/risccode /, 2007.
- [16] N. Slonim and N. Tishby, “Document Clustering Using Word Clusters via the Information Bottleneck Method,”
Proc. ACM SIGIR '00, pp. 208-215, 2000.- [17] J. Besag, “On the Statistical Analysis of Dirty Pictures,”
J. Royal Statistical Soc., vol. 48, no. 3, pp. 259-302, 1986.- [18] Y. Zhang, M. Brady, and S. Smith, “Segmentation of Brain MR Images through a Hidden Markov Random Field Model and the Expectation-Maximization Algorithm,”
IEEE Trans. Medical Imaging, vol. 20, no. 1, 2001. |