2013 12th International Conference on Document Analysis and Recognition (2013)
Washington, DC, USA USA
Aug. 25, 2013 to Aug. 28, 2013
In this paper, we investigate a specific area of document classification in which the documents come as a flow over the time. Moreover, the exact number of classes of document to deal with is not known from the beginning and could evolve over the time. To be able to perform classification task in such area, we need specific classifiers that are able to perform incremental learning and change their modeling over the time. More specifically, we are focusing our study on SVM approaches, known to perform well, and for which incremental (i-SVM) procedures exist. Nevertheless, most of them are only able to deal with a fixed number of classes. So we designed a new incremental learning procedure based on one-class SVMs. This one is able to improve its classification accuracy over the time, with the arrival of new labeled data, without performing any complete retraining. Moreover, when instances are coming with a previously unknown label (appearance of a new class), the training procedure is able to modify the classifier model to recognize this corresponding new kind of documents. To investigate this area, waiting for collecting documents images as a flow, we did first experiments on the Optical Recognition of Handwritten Digits Data Set. These experiments show that our incremental approach is able: to perform, at each time, as well as a static one-class classifier fully retrained using all previously seen data, to model very quickly and efficiently new incoming classes.
Support vector machines, Training, Accuracy, Classification algorithms, Data models, Training data, Machine learning algorithms
A. K. Ho, N. Ragot, J. Ramel, V. Eglin and N. Sidere, "Document Classification in a Non-stationary Environment: A One-Class SVM Approach," 2013 12th International Conference on Document Analysis and Recognition(ICDAR), Washington, DC, USA USA, 2013, pp. 616-620.