This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
2005 International Conference on Cyberworlds (CW'05)
Spam Detection Using Text Clustering
Singapore
November 23-November 25
ISBN: 0-7695-2378-1
Minoru Sasaki, Ibaraki University, Japan
Hiroyuki Shinnou, Ibaraki University, Japan
We propose a new spam detection technique using the text clustering based on vector space model. Our method computes disjoint clusters automatically using a spherical k-means algorithm for all spam/non-spam mails and obtains centroid vectors of the clusters for extracting the cluster description. For each centroid vectors, the label(?spam? or ?non-spam?) is assigned by calculating the number of spam email in the cluster.When new mail arrives, the cosine similarity between the new mail vector and centroid vector is calculated. Finally, the label of the most relevant cluster is assigned to the new mail. By using our method, we can extract many kinds of topics in spam/non-spam email and detect the spam email efficiently. In this paper, we describe the our spam detection system and show the result of our experiments using the Ling-Spam test collection.
Citation:
Minoru Sasaki, Hiroyuki Shinnou, "Spam Detection Using Text Clustering," cw, pp.316-319, 2005 International Conference on Cyberworlds (CW'05), 2005
Usage of this product signifies your acceptance of the Terms of Use.