loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
19th IEEE International Conference on Tools with Artificial Intelligence - Vol.1 (ICTAI 2007)
Finding Hotspots in Document Collection
Paris, France
October 29-October 31
ISBN: 0-7695-3015-X
Given a document collection, it is often desirable to find the core subset of documents focusing on a specific topic. We propose a new algorithm for this task. Document clustering aims at par- titioning the document-term datasets into differ- ent groups by optimizing certain objective func- tions. However, they are not suitable for finding hotspots that are described by a small set of doc- uments with few tightly coupled terms. In this pa- per we propose a novel hotspot finding algorithm, DCC (Dense Concept Clustering) in document collections. DCC can extract distinct small top- ics with most representative documents and words simultaneously. The hotspots are dense bicliques in binary document-word matrices and they can be discovered sequentially one at a time using the generalized Motzkin-Straus formalism. The rep- resentative documents and words are tightly cor- related for concept descriptions. Experiments on real document datasets show the effectiveness of the proposed algorithm.
Citation:
Wei Peng, Chris Ding, Tao Li, Tong Sun, "Finding Hotspots in Document Collection," ictai, vol. 1, pp.313-320, 19th IEEE International Conference on Tools with Artificial Intelligence - Vol.1 (ICTAI 2007), 2007
Usage of this product signifies your acceptance of the Terms of Use.