loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
2008 19th International Conference on Database and Expert Systems Application
Proximity Estimation and Hardness of Short-Text Corpora
September 01-September 05
ISBN: 978-0-7695-3299-8
In this work, we investigate the relative hardness of short-text corpora in clustering problems and how this hardness relates to traditional similarity measures. Our approach basically attempts to establish a connection between the hardness of a corpus and the precisionlevel exhibited by similarity measures, according to the results obtainedwith different cluster validity measures on the "ideal" clustering ofeach corpus. Moreover, we also propose a new validity measure, namedcontiguity error that allowed us to observe this connection in a consistentway in all the collections considered.
Index Terms:
clustering, short-text corpora, proximity estimation, cluster validity measures
Citation:
Marcelo Luis Errecalde, Diego Ingaramo, Paolo Rosso, "Proximity Estimation and Hardness of Short-Text Corpora," dexa, pp.15-19, 2008 19th International Conference on Database and Expert Systems Application, 2008
Usage of this product signifies your acceptance of the Terms of Use.