|
| This Article | ||
| ||
| Share | ||
| Bibliographic References | ||
| Add to: | ||
| | ||
| Search | ||
| ||
Sixth IEEE International Conference on Data Mining (ICDM'06)
Semantic Smoothing for Model-based Document Clustering
Hong Kong
December 18-December 22
ISBN: 0-7695-2701-9
| ASCII Text | x | ||
| Xiaodan Zhang, Xiaohua Zhou, Xiaohua Hu, "Semantic Smoothing for Model-based Document Clustering," Data Mining, IEEE International Conference on, pp. 1193-1198, Sixth IEEE International Conference on Data Mining (ICDM'06), 2006. | |||
| BibTex | x | ||
| @article{ 10.1109/ICDM.2006.142, author = {Xiaodan Zhang and Xiaohua Zhou and Xiaohua Hu}, title = {Semantic Smoothing for Model-based Document Clustering}, journal ={Data Mining, IEEE International Conference on}, volume = {0}, year = {2006}, issn = {1550-4786}, pages = {1193-1198}, doi = {http://doi.ieeecomputersociety.org/10.1109/ICDM.2006.142}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, } | |||
| RefWorks Procite/RefMan/Endnote | x | ||
| TY - CONF JO - Data Mining, IEEE International Conference on TI - Semantic Smoothing for Model-based Document Clustering SN - 1550-4786 SP1193 EP1198 A1 - Xiaodan Zhang, A1 - Xiaohua Zhou, A1 - Xiaohua Hu, PY - 2006 KW - null VL - 0 JA - Data Mining, IEEE International Conference on ER - | |||
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICDM.2006.142
A document is often full of class-independent "general" words and short of class-specific 'core" words, which leads to the difficulty of document clustering. We argue that both problems will be relieved after suitable smoothing of document models in agglomerative approaches and of cluster models in partitional approaches, and hence improve clustering quality. To the best of our knowledge, most model-based clustering approaches use Laplacian smoothing to prevent zero probability while most similarity-based approaches employ the heuristic TF*IDF scheme to discount the effect of "general" words. Inspired by a series of statistical translation language model for text retrieval, we propose in this paper a novel smoothing method referred to as context-sensitive semantic smoothing for document clustering purpose. The comparative experiment on three datasets shows that model-based clustering approaches with semantic smoothing is effective in improving cluster quality.
Citation:
Xiaodan Zhang, Xiaohua Zhou, Xiaohua Hu, "Semantic Smoothing for Model-based Document Clustering," icdm, pp.1193-1198, Sixth IEEE International Conference on Data Mining (ICDM'06), 2006
Usage of this product signifies your acceptance of the Terms of Use.
