Issue No. 01 - January-March (2009 vol. 6)
Clustering datasets is a challenging problem needed in a wide array of applications. Partition-optimization approaches, such as k-means or expectation-maximization (EM) algorithms, are sub-optimal and find solutions in the vicinity of their initialization. This paper proposes a staged approach to specifying initial values by finding a large number of local modes and then obtaining representatives from the most separated ones. Results on test experiments are excellent. We also provide a detailed comparative assessment of the suggested algorithm with many commonly-used initialization approaches in the literature. Finally, the methodology is applied to two datasets on diurnal microarray gene expressions and industrial releases of mercury.
Clustering, classification, and association rules, Statistical methods, Singular value decomposition, Multivariate statistics
R. Maitra, "Initializing Partition-Optimization Algorithms," in IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 6, no. , pp. 144-157, 2007.