Fourth IEEE International Conference on Data Mining (ICDM'04)
IRC: An Iterative Reinforcement Categorization Algorithm for Interrelated Web Objects
Brighton, United Kingdom
November 01-November 04
ISBN: 0-7695-2142-8
Dou Shen, TsingHua University, Beijing, P.R. China
Qiang Yang, Hong Kong University of Science and Technology
Zheng Chen, Microsoft Research Asia, Beijing, P.R. China
Yong Yu, Shanghai Jiao-Tong University, P.R. China
WenSi Xi, Virginia Polytechnic Institute and State University
Wei-Ying Ma, Microsoft Research Asia, Beijing, P.R. China
Most existing categorization algorithms deal with homogeneous Web data objects, and consider interrelated objects as additional features when taking the interrelationships with other types of objects into account. However, focusing on any single aspects of these interrelationships and objects will not fully reveal their true categories. In this paper, we propose a novel categorization algorithm, the Iterative Reinforcement Categorization algorithm (IRC), to exploit the full interrelationships between the heterogeneous objects on the Web. IRC attempts to classify the interrelated Web objects by iterative reinforcement between individual classification results of different types via the interrelationships. Experiments on a clickthrough log dataset from MSN search engine show that, with the F1 measures, IRC achieves a 26.4% improvement over a pure content-based classification method, a 21% improvement over a query metadata-based method, and a 16.4% improvement over a virtual document-based method. Furthermore, our experiments show that IRC converges rapidly.
Citation:
Gui-Rong Xue, Dou Shen, Qiang Yang, Hua-Jun Zeng, Zheng Chen, Yong Yu, WenSi Xi, Wei-Ying Ma, "IRC: An Iterative Reinforcement Categorization Algorithm for Interrelated Web Objects," icdm, pp.273-280, Fourth IEEE International Conference on Data Mining (ICDM'04), 2004