Congfu Xu , Institute of Artificial Intelligence College of Computer Science, Hangzhou
Baojun Su , Institute of Artificial Intelligence College of Computer Science, Hangzhou
Yunbiao Cheng , Institute of Artificial Intelligence College of Computer Science, Zhejiang University, Hangzhou
Weike Pan , College of Computer Science and Software Engineering, Shenzhen
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/MIS.2013.54
Spam detection has become a critical component in various online systems, like Email services, advertising engines, social media sites, etc. Diversity and dynamics are two main characteristics of spams, while one single online learner as deployed by many commercial systems is usually not sufficient to capture different aspects of spams, and thus may fail to learn the model parameters accurately. In this paper, we take Email services as an example, and present an adaptive fusion algorithm for spam detection (AFSD), which is a general content-based approach and can be applied to non-Email spam detection tasks with little additional effort. In our proposed algorithm, we (1) use n-grams of non-tokenized text strings to represent an Email, (2) introduce a link function in order to convert the prediction scores of online learners to be more comparable ones, (3) train the online learners in a mistake-driven manner via &#x201C;thick thresholding&#x201D; to obtain high competitive online learners, and (4) design update rules to adaptively integrate the online learners to capture different aspects of spams. We study the prediction performance of AFSD on five public competition datasets and one industry dataset, and observe that our algorithm achieves significantly better results than several state-of-the-art approaches, including the champion solutions of the corresponding competitions.
Congfu Xu, Baojun Su, Yunbiao Cheng, Weike Pan, "An Adaptive Fusion Algorithm for Spam Detection", IEEE Intelligent Systems, , no. 1, pp. 1, PrePrints PrePrints, doi:10.1109/MIS.2013.54