2016 International Conference on Big Data and Smart Computing (BigComp) (2016)
Hong Kong, China
Jan. 18, 2016 to Jan. 20, 2016
Jong Myoung Kim , School of Computing, Korea Advanced Institute of Science and Technology (KAIST), Korea
Zae Myung Kim , School of Computing, Korea Advanced Institute of Science and Technology (KAIST), Korea
Kwangjo Kim , School of Computing, Korea Advanced Institute of Science and Technology (KAIST), Korea
Previous research in spam detection, especially in email spam filtering, mainly focused on learning a set of discriminative features that are often present in the spam contents. Nowadays, these commercially oriented spams are well detected; the real challenge lies in filtering rather vague spams that do not exhibit distinctive spam keywords. We investigate two ways of detecting such spams: 1) By comparing the similarity between the publisher posts and user comments, and 2) by learning a single representative meta-feature such as user name or ID. The first measure relieves us from repetitively learning a set of domain-dependent spam features, and the second measure enables us to detect potential spam users even before the aggressive actions are performed. Prior to the language model comparison in the first method, we supplement the background information, normalize the text, perform co-reference resolution, and conduct word-to-word similarity measure in hope of enriching the language models to improve the classification accuracy. To evaluate the first measure, experiments on detecting blog-spam comments are conducted. As for the second measure, we employ SVM on the ID space of e-mail data collected by "Apache Spam Assassin".
Electronic mail, Feature extraction, Blogs, Encyclopedias, Support vector machines, Information filtering
Jong Myoung Kim, Zae Myung Kim and Kwangjo Kim, "An approach to spam comment detection through domain-independent features," 2016 International Conference on Big Data and Smart Computing (BigComp)(BIGCOMP), Hong Kong, China, 2016, pp. 273-276.