Frontiers in Handwriting Recognition, International Conference on (2010)
Nov. 16, 2010 to Nov. 18, 2010
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICFHR.2010.12
Arbitrary orientation and sparse data content are common characteristics of torn document. To ensure accuracy and reliability in computer-based analysis, content-zone segmentation is required. In our previous work, we studied segmentation of handwritten and printed text. A questioned document-piece in the form of an office note, however, might also contain non-text data like logos, graphics, and pictures. Hence a more precise content-zone classification is required. In this paper we propose a two-tier approach for non-text, handwriting and printed text segmentation. The first tier aims to discriminate text and non-text regions. The second tier classifies handwritten and printed text within all text zones identified during the first tier. Gabor features and chain-code features are used in Tier-1 and Tier-2, respectively. By using SVM classifier we successfully identified 97.65% of 31,227 text regions in our current test data. The proposed approach identified 98.69% of printed and 96.39% of handwritten text amongst all identified text regions.
Torn Document Recognition, Text Classification, Printed and Handwritten Text Segmentation, Text Graphics Segmentation
K. Franke, S. Chanda and U. Pal, "Document-Zone Classification in Torn Documents," Frontiers in Handwriting Recognition, International Conference on(ICFHR), Kolkata, India, 2010, pp. 25-30.