loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
2008 19th International Conference on Database and Expert Systems Application
Topic Detection by Clustering Keywords
September 01-September 05
ISBN: 978-0-7695-3299-8
We consider topic detection without any prior knowledgeof category structure or possible categories. Keywords are extracted and clustered based on different similarity measures using the induced k-bisecting clustering algorithm. Evaluation on Wikipedia articles shows that clusters of keywords correlate strongly with the Wikipedia categories of the articles. In addition, we find that a distance measure based on the Jensen-Shannon divergence of probability distributions outperforms the cosine similarity. In particular, a newly proposed term distribution taking co-occurrence of terms into account gives best results.
Index Terms:
Topic detection, Keywords, Jensen Shannon Divergence, Clustering, Natural Language Processing, Datamining
Citation:
Christian Wartena, Rogier Brussee, "Topic Detection by Clustering Keywords," dexa, pp.54-58, 2008 19th International Conference on Database and Expert Systems Application, 2008
Usage of this product signifies your acceptance of the Terms of Use.