The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.04 - April (2006 vol.18)
pp: 448-459
ABSTRACT
Feature selection has been widely applied in text categorization and clustering. Compared to unsupervised selection, supervised feature selection is more successful in filtering out noise in most cases. However, due to a lack of label information, clustering can hardly exploit supervised selection. Some studies have proposed to solve this problem by "pseudoclass.” As empirical results show, this method is sensitive to selection criteria and data sets. In this paper, we propose a novel feature coselection for Web document clustering, which is called Multitype Features Coselection for Clustering (MFCC). MFCC uses intermediate clustering results in one type of feature space to help the selection in other types of feature spaces. Our experiments show that for most selection criteria, MFCC reduces effectively the noise introduced by "pseudoclass,” and further improves clustering performance.
INDEX TERMS
Web mining, clustering, feature evaluation and selection.
CITATION
Shen Huang, Zheng Chen, Yong Yu, Wei-Ying Ma, "Multitype Features Coselection for Web Document Clustering", IEEE Transactions on Knowledge & Data Engineering, vol.18, no. 4, pp. 448-459, April 2006, doi:10.1109/TKDE.2006.63
6 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool