2014 Sixth International Symposium on Parallel Architectures, Algorithms and Programming (PAAP) (2014)
July 13, 2014 to July 15, 2014
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/PAAP.2014.13
Document clustering is a significantly popularresearch, which aims to partition a corpus into many subgroupsof homogeneous documents. Traditional clustering approachescatholically lack of considerations of word weights with clusters. To address this problem, we propose an Adaptive CentroidbasedClustering (ACC) algorithm. As a successful supervisedcentroid-based classifier, Class-Feature-Centroid (CFC) algorithmtakes relationships among words into account. ACCattempts to employ this discriminative CFC vector to drive theclustering procedure. Since clustering is unsupervised, ACCbegins with hundreds of small clusters for acceptable CFCvectors, and then iteratively regroups clusters of documentsuntil convergence. As ACC is self-organized, it can determinethe number of clusters adaptively. The experimental resultsvalidate that ACC achieves competitive performance with thestate-of-art clustering approaches.
Clustering algorithms, Vectors, Entropy, Partitioning algorithms, Measurement, Frequency modulation, Algorithm design and analysis,adaptively, document clustering, Class-Feature-Centroid
Ximing Li, Jihong Ouyang, Xiaotang Zhou, Bo Fu, "Adaptive Centroid-Based Clustering Algorithm for Text Document Data", 2014 Sixth International Symposium on Parallel Architectures, Algorithms and Programming (PAAP), vol. 00, no. , pp. 63-68, 2014, doi:10.1109/PAAP.2014.13