The Community for Technology Leaders
Issue No. 07 - July (2018 vol. 30)
ISSN: 1041-4347
pp: 1338-1351
Shuji Hao , Institute of High Performance Computing, Agency for Science Technology and Research, Singapore, Singapore
Jing Lu , School of Information Systems, Singapore Management University, Singapore, Singapore
Peilin Zhao , School of Software Engineering, South China University of Technology, Guangzhou, China
Chi Zhang , Interdisciplinary Graduate School, Nanyang Technological University, Singapore, Singapore
Steven C.H. Hoi , School of Information Systems, Singapore Management University, Singapore, Singapore
Chunyan Miao , School of Computer Engineering, Nanyang Technological University, Singapore, Singapore
ABSTRACT
The goal of online active learning is to learn predictive models from a sequence of unlabeled data given limited label query budget. Unlike conventional online learning tasks, online active learning is considerably more challenging because of two reasons. First, it is difficult to design an effective query strategy to decide when is appropriate to query the label of an incoming instance given limited query budget. Second, it is also challenging to decide how to update the predictive models effectively whenever the true label of an instance is queried. Most existing approaches for online active learning are often based on a family of first-order online learning algorithms, which are simple and efficient but fall short in the slow convergence and sub-optimal solution in exploiting the labeled training data. To solve these issues, this paper presents a novel framework of Second-order Online Active Learning (SOAL) by fully exploiting both the first-order and second-order information. The proposed algorithms are able to achieve effective online learning efficacy, maximize the predictive accuracy, and minimize the labeling cost. To make SOAL more practical for real-world applications, especially for class-imbalanced online classification tasks (e.g., malicious web detection), we extend the SOAL framework by proposing the Cost-sensitive Second-order Online Active Learning algorithm named “SOAL$_{CS}$ ”, which is devised by maximizing the sum of weighted sensitivity and specificity or minimizing the cost of weighted mistakes of different classes. We conducted both theoretical analysis and empirical studies, including an extensive set of experiments on a variety of large-scale real-world datasets, in which the promising empirical results validate the efficacy and scalability of the proposed algorithms towards large-scale online learning tasks.
INDEX TERMS
Algorithm design and analysis, Prediction algorithms, Predictive models, Labeling, Machine learning algorithms, Training
CITATION

S. Hao, J. Lu, P. Zhao, C. Zhang, S. C. Hoi and C. Miao, "Second-Order Online Active Learning and Its Applications," in IEEE Transactions on Knowledge & Data Engineering, vol. 30, no. 7, pp. 1338-1351, 2018.
doi:10.1109/TKDE.2017.2778097