loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
2005 IEEE International Conference on e-Technology, e-Commerce and e-Service (EEE'05)
Learning the Kernel Matrix for XML Document Clustering
Hong Kong, China
March 29-April 01
ISBN: 0-7695-2274-2
Jianwu Yang, Peking University, China
William K. Cheung, Hong Kong Baptist University, Kowloon Tong
Xiaoou Chen, Peking University, China
The rapid growth of XML adoption has urged for the need of a proper representation for semi-structured documents, where the document structural information has to be taken into account so as to support more precise document analysis. In this paper, an XML document representation named "structured link vector model" is adopted, with a kernel matrix included for modeling the similarity between XML elements. Our formulation allows individual XML elements to have their own weighted contribution to the overall document similarity while at the same time allows the between-element similarity to be captured. An iterative algorithm is derived to learn the kernel matrix. For performance evaluation, the ACM SIGMOD Record dataset as well as the CEDB dataset have been tested. Our proposed method outperforms significantly the traditional vector space model and the edit-distance based methods. In addition, the kernel matrix obtained as a by-product provides knowledge about the conceptual relationship between the XML elements.
Citation:
Jianwu Yang, William K. Cheung, Xiaoou Chen, "Learning the Kernel Matrix for XML Document Clustering," eee, pp.353-358, 2005 IEEE International Conference on e-Technology, e-Commerce and e-Service (EEE'05), 2005
Usage of this product signifies your acceptance of the Terms of Use.