2005 International Conference on Cyberworlds (CW'05) (2005)
Nov. 23, 2005 to Nov. 25, 2005
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/CW.2005.83
Minoru Sasaki , Ibaraki University, Japan
Hiroyuki Shinnou , Ibaraki University, Japan
We propose a new spam detection technique using the text clustering based on vector space model. Our method computes disjoint clusters automatically using a spherical k-means algorithm for all spam/non-spam mails and obtains centroid vectors of the clusters for extracting the cluster description. For each centroid vectors, the label(?spam? or ?non-spam?) is assigned by calculating the number of spam email in the cluster.When new mail arrives, the cosine similarity between the new mail vector and centroid vector is calculated. Finally, the label of the most relevant cluster is assigned to the new mail. By using our method, we can extract many kinds of topics in spam/non-spam email and detect the spam email efficiently. In this paper, we describe the our spam detection system and show the result of our experiments using the Ling-Spam test collection.
M. Sasaki and H. Shinnou, "Spam Detection Using Text Clustering," 2005 International Conference on Cyberworlds (CW'05)(CW), Singapore, 2005, pp. 316-319.