Fifth IEEE International Conference on Data Mining (ICDM'05) Effective and Efficient Distributed Model-Based Clustering Houston, Texas November 27-November 30 ISBN: 0-7695-2278-5
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICDM.2005.53
In many companies data is distributed among several sites, i.e. each site generates its own data and manages its own data repository. Analyzing and mining these distributed sources requires distributed data mining techniques to find global patterns representing the complete information. The transmission of the entire local data set is often unacceptable because of performance considerations, privacy and security aspects, and bandwidth constraints. Traditional data mining algorithms, demanding access to complete data, are not appropriate for distributed applications. Thus, there is a need for distributed data mining algorithms in order to analyze and discover new knowledge in distributed environments. One of the most important data mining tasks is clustering which aims at detecting groups of similar data objects. In this paper, we propose a distributed model-based clustering algorithm that uses EM for detecting local models in terms of mixtures of Gaussian distributions. We propose an efficient and effective algorithm for deriving and merging these local Gaussian distributions to generate a meaningful global model. In a broad experimental evaluation we show that our framework is scalable in a highly distributed environment.
Citation:
Hans-Peter Kriegel, Peer Kröger, Alexey Pryakhin, Matthias Schubert, "Effective and Efficient Distributed Model-Based Clustering," icdm, pp.258-265, Fifth IEEE International Conference on Data Mining (ICDM'05), 2005 Usage of this product signifies your acceptance of the Terms of Use. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||