2007 Seventh IEEE International Conference on Data Mining
Semi-supervised Document Clustering via Active Learning with Pairwise Constraints
Omaha, Nebraska, USA
October 28-October 31
ISBN: 0-7695-3018-4
This paper investigates a framework that discovers pairwise constraints for semi-supervised text document clustering. An active learning approach is proposed to select informative document pairs for obtaining user feedbacks. A gain directed document pair selection method that measures how much we can learn by revealing the relationships between pairs of documents is designed. Three different models, namely, uncertainty model, generation error model, and objective function model are proposed. Language modeling is investigated for representing clusters in the semi-supervised document clustering approach.