The Community for Technology Leaders
2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR) (2017)
Kyoto, Japan
Nov. 9, 2017 to Nov. 15, 2017
ISSN: 2379-2140
ISBN: 978-1-5386-3586-5
pp: 697-702
ABSTRACT
With the development of globalization, script identification has become a active field in the document image processing. However, many methods only have good recognition effect on the scripts of particular countries and areas, and cannot be applied to all scripts. Especially for Central Asia, there are few such research. In this paper, Nonsubsampled Contourlet Transform (NSCT) was used for the texture feature extraction of document images in Central Asian scripts, and K Nearest Neighbor (KNN) classifier was used for classification. A total of 7,000 document images of 10 scripts including English, Chinese, Uyghur, Tibetan, Arabic, Turkish, Mongolian, Russian, Kazakhstan, Kyrgyzstan were classified and 98.7% of average accuracy was obtained. Experimental results indicate that the method of script identification proposed in this paper is effective for multi-scripts document image, especially for Central Asian scripts.
INDEX TERMS
document image processing, feature extraction, image classification, image segmentation, image texture, natural language processing, optical character recognition, text analysis, transforms
CITATION

X. Han, A. Aysa, N. Yadikar, H. Mamat and K. Ubul, "Script Identification Based on Nonsubsampled Contourlet Transform," 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), Kyoto, Japan, 2018, pp. 697-702.
doi:10.1109/ICDAR.2017.119
468 ms
(Ver 3.3 (11022016))