The Community for Technology Leaders
2014 Sixth International Symposium on Parallel Architectures, Algorithms and Programming (PAAP) (2014)
Beijing, China
July 13, 2014 to July 15, 2014
ISSN: 2168-3034
ISBN: 978-1-4799-3844-5
pp: 63-68
ABSTRACT
Document clustering is a significantly popularresearch, which aims to partition a corpus into many subgroupsof homogeneous documents. Traditional clustering approachescatholically lack of considerations of word weights with clusters. To address this problem, we propose an Adaptive CentroidbasedClustering (ACC) algorithm. As a successful supervisedcentroid-based classifier, Class-Feature-Centroid (CFC) algorithmtakes relationships among words into account. ACCattempts to employ this discriminative CFC vector to drive theclustering procedure. Since clustering is unsupervised, ACCbegins with hundreds of small clusters for acceptable CFCvectors, and then iteratively regroups clusters of documentsuntil convergence. As ACC is self-organized, it can determinethe number of clusters adaptively. The experimental resultsvalidate that ACC achieves competitive performance with thestate-of-art clustering approaches.
INDEX TERMS
Clustering algorithms, Vectors, Entropy, Partitioning algorithms, Measurement, Frequency modulation, Algorithm design and analysis
CITATION
Ximing Li, Jihong Ouyang, Xiaotang Zhou, Bo Fu, "Adaptive Centroid-Based Clustering Algorithm for Text Document Data", 2014 Sixth International Symposium on Parallel Architectures, Algorithms and Programming (PAAP), vol. 00, no. , pp. 63-68, 2014, doi:10.1109/PAAP.2014.13
87 ms
(Ver 3.3 (11022016))