Parallel Architectures, Algorithms and Programming, International Symposium on (2010)
Dalian, Liaoning China
Dec. 18, 2010 to Dec. 20, 2010
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/PAAP.2010.55
An effective XML cluster method called neighbor center clustering algorithm (NCC) is presented in this paper, whose similarity is obtained through both structural and content information contained in XML files. Structural similarity is measured by the idea of Longest Common Subsequence, while content similarity is achieved using TF-IDF principles. It reduces computation complexity by avoiding direct search for cluster centers. Experiments show that the NCC can obtain high purity and F-measure value and is suitable and applicable for clustering XML with both homogenous and heterogeneous structures.
neighbor center cluster, Longest Common Subsequence, structural similarity
Xiu-kun Wang, Chen Liu, Yong Piao, "A Hybrid Method for XML Clustering", Parallel Architectures, Algorithms and Programming, International Symposium on, vol. 00, no. , pp. 286-290, 2010, doi:10.1109/PAAP.2010.55