2009 50th Annual IEEE Symposium on Foundations of Computer Science (2009)

Atlanta, GA

Oct. 25, 2009 to Oct. 27, 2009

ISSN: 0272-5428

ISBN: 978-1-4244-5116-6

pp: 405-414

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/FOCS.2009.14

ABSTRACT

The k-means method is one of the most widely used clustering algorithms, drawing its popularity from its speed in practice. Recently, however, it was shown to have exponential worst-case running time. In order to close the gap between practical performance and theoretical analysis, the k-means method has been studied in the model of smoothed analysis. But even the smoothed analyses so far are unsatisfactory as the bounds are still super-polynomial in the number n of data points. In this paper, we settle the smoothed running time of the k-means method. We show that the smoothed number of iterations is bounded by a polynomial in n and 1/¿, where sigma is the standard deviation of the Gaussian perturbations. This means that if an arbitrary input data set is randomly perturbed, then the k-means method will run in expected polynomial time on that input set.

INDEX TERMS

computational complexity, Gaussian processes, pattern clustering

CITATION

D. Arthur, B. Manthey and H. Röglin, "k-Means Has Polynomial Smoothed Complexity,"

*2009 50th Annual IEEE Symposium on Foundations of Computer Science(FOCS)*, Atlanta, Georgia, 2010, pp. 405-414.

doi:10.1109/FOCS.2009.14

CITATIONS