2010 IEEE International Conference on Data Mining Workshops (2010)
Dec. 13, 2010 to Dec. 13, 2010
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICDMW.2010.53
The data explosion in many applications requires efficient data mining solutions. Fortunately, emerging technologies like grid and cloud computing, high-performance multi-core processors and graphics processing units provide the potential to keep pace with the data explosion and open up new opportunities for designing efficient algorithms. In this paper, we propose a parallel variant of the Expectation Maximization (EM) algorithm suitable for clustering large data sets in a distributed environment. The conventional EM algorithm sequentially iterates two phases: In the E-step, points are assigned to the clusters and in the M-step the cluster models are updated. The basic idea of our approach is allowing asynchronous model updates for faster convergence and best usage of the available resources. The frequency of the updates can be flexibly adjusted to the specific characteristics of the environment including communication costs and computing power of the single devices. An extensive experimental evaluation demonstrates the benefits of our approach.
C. Plant and C. Böhm, "Parallel EM-Clustering: Fast Convergence by Asynchronous Model Updates," 2010 IEEE International Conference on Data Mining Workshops(ICDMW), Sydney, Australia, 2010, pp. 178-185.