loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
2007 Seventh IEEE International Conference on Data Mining
Using Burstiness to Improve Clustering of Topics in News Streams
Omaha, Nebraska, USA
October 28-October 31
ISBN: 0-7695-3018-4
Specialists who analyze online news have a hard time separating the wheat from the chaff. Moreover, automatic data-mining techniques like clustering of news streams into topical groups can fully recover the underlying true class labels of data if and only if all classes are well separated. In reality, especially for news streams, this is clearly not the case. The question to ask is thus this: if we cannot recover the full C classes by clustering, what is the largest K \le C clusters we can find that best resemble the K underlying classes? Using the intuition that bursty topics are more likely to correspond to important events that are of interest to analysts, we propose several new bursty vector space models (B-VSM) for representing a news document. B-VSM takes into account the burstiness (across the full corpus and whole duration) of each constituent word in a document at the time of publication. We benchmarked our B-VSM against the classical TFIDF-VSM on the task of clustering a collection of news stream articles with known topic labels. Experimental results show that B-VSM was able to find the burstiest clusters/topics. Further, it also significantly improved the recall and precision for the top K clusters/topics.
Citation:
Qi He, Kuiyu Chang, Ee-Peng Lim, "Using Burstiness to Improve Clustering of Topics in News Streams," icdm, pp.493-498, 2007 Seventh IEEE International Conference on Data Mining, 2007
Usage of this product signifies your acceptance of the Terms of Use.