The Community for Technology Leaders
Green Image
Issue No. 10 - Oct. (2012 vol. 24)
ISSN: 1041-4347
pp: 1848-1861
Wolf Siberski , L3S Research Center, Hannover
Odysseas Papapetrou , Technical University of Crete, Chania
Norbert Fuhr , University of Duisburg-Essen, Duisburg
ABSTRACT
Text clustering is an established technique for improving quality in information retrieval, for both centralized and distributed environments. However, traditional text clustering algorithms fail to scale on highly distributed environments, such as peer-to-peer networks. Our algorithm for peer-to-peer clustering achieves high scalability by using a probabilistic approach for assigning documents to clusters. It enables a peer to compare each of its documents only with very few selected clusters, without significant loss of clustering quality. The algorithm offers probabilistic guarantees for the correctness of each document assignment to a cluster. Extensive experimental evaluation with up to 1 million peers and 1 million documents demonstrates the scalability and effectiveness of the algorithm.
INDEX TERMS
Clustering algorithms, Peer to peer computing, Probabilistic logic, Frequency estimation, Indexing, Computational modeling, text clustering., Distributed clustering
CITATION
Wolf Siberski, Odysseas Papapetrou, Norbert Fuhr, "Decentralized Probabilistic Text Clustering", IEEE Transactions on Knowledge & Data Engineering, vol. 24, no. , pp. 1848-1861, Oct. 2012, doi:10.1109/TKDE.2011.120
89 ms
(Ver 3.1 (10032016))