The Community for Technology Leaders
Green Image
Issue No. 06 - June (2013 vol. 25)
ISSN: 1041-4347
pp: 1227-1239
Weihong Qian , IBM Research-China, Beijing
Furu Wei , Microsoft Research Asia, Beijing
Michelle X. Zhou , IBM Research - Almaden Center, San Jose
Shixia Liu , Microsoft Research Asia, Beijing
Shimei Pan , IBM Research - T. J. Watson Center, Hawthorne
Yangqiu Song , Microsoft Research Asia, Beijing
ABSTRACT
In this paper, we propose a novel constrained coclustering method to achieve two goals. First, we combine information-theoretic coclustering and constrained clustering to improve clustering performance. Second, we adopt both supervised and unsupervised constraints to demonstrate the effectiveness of our algorithm. The unsupervised constraints are automatically derived from existing knowledge sources, thus saving the effort and cost of using manually labeled constraints. To achieve our first goal, we develop a two-sided hidden Markov random field (HMRF) model to represent both document and word constraints. We then use an alternating expectation maximization (EM) algorithm to optimize the model. We also propose two novel methods to automatically construct and incorporate document and word constraints to support unsupervised constrained clustering: 1) automatically construct document constraints based on overlapping named entities (NE) extracted by an NE extractor; 2) automatically construct word constraints based on their semantic distance inferred from WordNet. The results of our evaluation over two benchmark data sets demonstrate the superiority of our approaches against a number of existing approaches.
INDEX TERMS
Clustering algorithms, Semantics, Hidden Markov models, Sparse matrices, Clustering methods, Humans, Computational modeling, text clustering, Constrained clustering, coclustering, unsupervised constraints
CITATION
Weihong Qian, Furu Wei, Michelle X. Zhou, Shixia Liu, Shimei Pan, Yangqiu Song, "Constrained Text Coclustering with Supervised and Unsupervised Constraints", IEEE Transactions on Knowledge & Data Engineering, vol. 25, no. , pp. 1227-1239, June 2013, doi:10.1109/TKDE.2012.45
97 ms
(Ver 3.3 (11022016))