This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
On Weighting Clustering
August 2006 (vol. 28 no. 8)
pp. 1223-1235
Recent papers and patents in iterative unsupervised learning have emphasized a new trend in clustering. It basically consists of penalizing solutions via weights on the instance points, somehow making clustering move toward the hardest points to cluster. The motivations come principally from an analogy with powerful supervised classification methods known as boosting algorithms. However, interest in this analogy has so far been mainly borne out from experimental studies only. This paper is, to the best of our knowledge, the first attempt at its formalization. More precisely, we handle clustering as a constrained minimization of a Bregman divergence. Weight modifications rely on the local variations of the expected complete log-likelihoods. Theoretical results show benefits resembling those of boosting algorithms and bring modified (weighted) versions of clustering algorithms such as k\hbox{-}\rm means, fuzzy c\hbox{-}\rm means, Expectation Maximization (EM), and k\hbox{-}\rm harmonic means. Experiments are provided for all these algorithms, with a readily available code. They display the advantages that subtle data reweighting may bring to clustering.

[1] C. Gentile and M. Warmuth, “Proving Relative Loss Bounds for On-Line Learning Algorithms Using Bregman Divergences,” Proc. Tutorials 13th Int'l Conf. Computational Learning Theory, 2000.
[2] Y. Freund and R.E. Schapire, “A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting,” J. Computer and System Sciences, vol. 55, pp. 119-139, 1997.
[3] J. Kivinen and M. Warmuth, “Boosting as Entropy Projection,” Proc. 12th Ann. Conf. Computational Learning Theory, pp. 134-144, 1999.
[4] R.E. Schapire and Y. Singer, “Improved Boosting Algorithms Using Confidence-Rated Predictions,” Proc. 11th Int'l Conf. Computational Learning Theory, pp. 80-91, 1998.
[5] M.J. Kearns, “Thoughts on Hypothesis Boosting,” ML class project, 1988.
[6] A. Banerjee, S. Merugu, I. Dhillon, and J. Ghosh, “Clustering with Bregman Divergences,” Proc. Fourth SIAM Int'l Conf. Data Mining, pp. 234-245, 2004.
[7] A. Banerjee, S. Merugu, I. Dhillon, and J. Ghosh, “Clustering with Bregman Divergences,” J. Machine Learning Research, vol. 6, pp. 1705-1749, 2005.
[8] G. Hammerly and C. Elkan, “Alternatives to the $k \hbox{-}\rm Means$ Algorithm that Find Better Clusterings,” Proc. 11th ACM Int'l Conf. Information and Knowledge Management, pp. 600-607, 2002.
[9] B. Zhang, “Generalized $k \hbox{-}\rm Harmonic$ Means,” Technical Report TR-HPL-2000-137, Hewlett Packard Labs, 2000.
[10] A. Topchy, B. Minaei-Bidgoli, A.-K. Jain, and W.-F. Punch, “Adaptive Clustering Ensembles,” Proc. 17th Int'l Conf. Pattern Recognition, pp. 272-275, 2004.
[11] B. Zhang, M. Hsu, and U. Dayal, “$k \hbox{-}\rm Harmonic$ Means— A Spatial Clustering Algorithm with Boosting,” Temporal, Spatial, and Spatio-Temporal Data Mining, pp. 31-45, 2000.
[12] B. Zhang, M. Hsu, and U. Dayal, “Harmonic Average Based Clustering Method and System,” US Patent 6,584,433, 2000.
[13] J. McQueen , “Some Methods for Classification and Analysis of Multivariate Observations,” Proc. Fifth Berkeley Symp. Math. Statistics and Probability, pp. 281-297, 1967.
[14] J.-C. Bezdek, Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press, 1981.
[15] A.P. Dempster, N.M. Laird, and D.B. Rubin, “Maximum Likelihood from Incomplete Data via the EM Algorithm,” J. Royal Statistical Soc. B, vol. 39, pp. 1-38, 1977.
[16] M.J. Kearns, Y. Mansour, and A.Y. Ng, “An Information-Theoretic Analysis of Hard and Soft Assignment Methods for Clustering,” Proc. 13th Int'l Conf. Uncertainty in Artificial Intelligence, pp. 282-293, 1997.
[17] I. Budimir, S. Dragomir, and J. Pecaric, “Further Reverse Results for Jensen's Discrete Inequality and Applications in Information Theory,” J. Inequalities in Pure and Applied Math., vol. 3, 2000.
[18] L. Devroye, L. Györfi, and G. Lugosi, A Probabilistic Theory of Pattern Recognition. Springer, 1996.
[19] H. Attias, “A Variational Bayesian Framework for Graphical Models,” Advances in Neural Information Processing Systems 12, pp. 209-215, 1999.
[20] M.-J. Beal and Z. Ghahramani, “The Variational Bayesian EM Algorithm for Incomplete Data: With Application to Scoring Graphical Models,” Bayesian Statistics, vol. 7, pp. 453-464, 2003.

Index Terms:
Clustering, Bregman divergences, k\hbox{-}\rm means, fuzzy k\hbox{-}\rm means, expectation maximization, harmonic means clustering.
Citation:
Richard Nock, Frank Nielsen, "On Weighting Clustering," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 8, pp. 1223-1235, Aug. 2006, doi:10.1109/TPAMI.2006.168
Usage of this product signifies your acceptance of the Terms of Use.