2011 International Conference on Document Analysis and Recognition (2011)
Sept. 18, 2011 to Sept. 21, 2011
Questioned Document Examination processes often encompass analysis of torn documents. To aid a forensic expert, automatic classification of content type in torn documents might be useful. This helps a forensic expert to sort out similar document fragments from a pile of torn documents. One parameter of similarity could be the script of the text. In this article we propose a method to identify the script in document fragments. Torn documents are normally characterized by text with arbitrary orientation. We use Zernike moment -- based feature that is rotation invariant together with Support Vector Machine (SVM) to classify the script type. Subsequently gradient features are used for comparative analysis of results between rotation dependent and rotation invariant feature type. We achieved an overall script-identification accuracy of 81.39% when dealing with 11 different scripts at character/connected-component level and 94.65% at word level.
Script Identification, Torn Document, Gaussian Kernel SVM, Computational Forensics
U. Pal, K. Franke and S. Chanda, "Identification of Indic Scripts on Torn-Documents," 2011 International Conference on Document Analysis and Recognition(ICDAR), Beijing, China, 2011, pp. 713-717.