This Article 
 Bibliographic References 
 Add to: 
A Genetic Algorithm Using Hyper-Quadtrees for Low-Dimensional K-means Clustering
April 2006 (vol. 28 no. 4)
pp. 533-543
The k-means algorithm is widely used for clustering because of its computational efficiency. Given n points in d\hbox{-}{\rm{dimensional}} space and the number of desired clusters k, k-means seeks a set of k cluster centers so as to minimize the sum of the squared Euclidean distance between each point and its nearest cluster center. However, the algorithm is very sensitive to the initial selection of centers and is likely to converge to partitions that are significantly inferior to the global optimum. We present a genetic algorithm (GA) for evolving centers in the k-means algorithm that simultaneously identifies good partitions for a range of values around a specified k. The set of centers is represented using a hyper-quadtree constructed on the data. This representation is exploited in our GA to generate an initial population of good centers and to support a novel crossover operation that selectively passes good subsets of neighboring centers from parents to offspring by swapping subtrees. Experimental results indicate that our GA finds the global optimum for data sets with known optima and finds good solutions for large simulated data sets.

[1] G.P. Babu and M.N. Murty, “A Near-Optimal Initial Seed Value Selection for k-Means Algorithm Using Genetic Algorithm,” Pattern Recognition Letters, vol. 14, pp. 763-769, 1993.
[2] S. Bandyopadhyay and U. Maulik, “Non-Parametric Genetic Clustering: Comparison of Validity Indices,” IEEE Trans. Systems, Man, and Cybernetics, Part-C, vol. 31, no. 1, pp. 120-125, 2001.
[3] S. Bandyopadhyay and U. Maulik, “An Evolutionary Technique Based on K-Means Algorithm for Optimal Clustering in RN,” Information Sciences-Applications: An Int'l J., vol. 146, pp. 221-237, Oct. 2002.
[4] J.N. Bhuyan, V.V. Raghavan, and K.E. Venkatesh, “Genetic Algorithm for Clustering with an Ordered Representation,” Proc. Fourth Int'l Conf. Genetic Algorithms, pp. 408-415, 1991.
[5] L. Bottou and Y. Bengio, “Convergence Properties of the k-Means Algorithms,” Advances in Neural Information Processing Systems, 7, G. Tesauro and D. Touretzky, eds., pp. 585-592, MIT Press, 1995.
[6] Y.T. Chien, Interactive Pattern Recognition. New York: Marcel-Dekker, 1978.
[7] D. Goldberg, Genetic Algorithms in Search, Optimization and Machine Learning. Reading, Mass.: Addison-Wesley, 1989.
[8] G. Hamerly and C. Elkan, “Alternatives to the k-Means Algorithm that Find Better Clusterings,” Proc. 11th Int'l Conf. Information and Knowledge Management, Nov. 2002.
[9] G. Hamerl and C. Elkan, “Learning the k in k-Means,” Proc. 17th Ann. Conf. Neural Information Processing Systems (NIPS), Dec. 2003.
[10] P. Hansen and N. Mladenovic, “J-Means: A New Local Search Heuristic for Minimum Sum-of-Squares Clustering,” Pattern Recognition, vol. 34, no. 2, pp. 405-413, 2001.
[11] D.S. Hochbaum and D.B. Shmoys, “A Best Possible Heuristic for the k-Center problem,” Math. Operations Research, vol. 10, no. 2, pp. 180-184, 1985.
[12] A.K. Jain and R.C. Dubes, Algorithms for Clustering Data. Englewood Cliffs, N.J.: Prentice Hall, 1988.
[13] A.K. Jain, M.N. Murty, and P.J. Flynn, “Data Clustering: A Review,” ACM Computing Surveys, vol. 31, no. 3, 1999.
[14] D. Jones and M.A. Beltramo, “Solving Partitioning Problems with Genetic Algorithms,” Proc. Fourth Int'l Conf. Genetic Algorithms, pp. 442-449, 1991.
[15] T. Kanungo, D.M. Mount, N. Netanyahu, C. Piatko, R. Silverman, and A.Y. Wu, “An Efficient k-Means Clustering Algorithm: Analysis and Implementation,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, pp. 881-892, 2002.
[16] K. Krishna and M.N. Murty, “Genetic k-Means Algorithm,” IEEE Trans. Sysytems, Man, and Cybernetics, vol. 29, no. 3, pp. 433-439, June 1999.
[17] O.L. Mangasarian, “Mathematical Programming in Data Mining,” Data Mining and Knowledge Discovery, vol. 1, pp. 183-201, 1997.
[18] U. Maulik and S. Bandyopadhyay, “Genetic Algorithm-Based Clustering Technique,” Pattern Recognition, vol. 33, pp. 1455-1465, 2000.
[19] J. McQueen, “Some Methods for Classification and Analysis of Multivariate Observations,” Proc. Fifth Berkeley Symp. Math. Statistics and Probability, pp. 281-297, 1967.
[20] J. Peña, J. Lozano, and P. Larrsñaga, “An Empirical Comparison of Four Initialization Methods for the k-Means Algorithm,” Pattern Recognition Letters, vol. 20, pp. 1027-1040, 1999.
[21] D. Pollard, “A Central Limit Theorem for k-Means Clustering,” Annals of Probability, vol. 10, pp. 919-926, 1982.
[22] S.Z. Selim and M.A. Ismail, “K-Means-Type Algorithms: A Generalized Convergence Theorem and Characterization of Local Optimality,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 6, pp. 81-87, 1984.
[23] H. Samet, The Design and Analysis of Spatial Data Structures. Reading, Mass.: Addison-Wesley, 1990.
[24] H. Spath, Cluster Analysis Algorithms for Data Reduction and Classification of Objects. Chichester: Ellis Horwood, 1980.

Index Terms:
k-means algorithm, clustering, genetic algorithms, quadtrees, optimal partition, center selection.
Michael Laszlo, Sumitra Mukherjee, "A Genetic Algorithm Using Hyper-Quadtrees for Low-Dimensional K-means Clustering," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 4, pp. 533-543, April 2006, doi:10.1109/TPAMI.2006.66
Usage of this product signifies your acceptance of the Terms of Use.