The Community for Technology Leaders
Frontiers in Handwriting Recognition, International Conference on (2010)
Kolkata, India
Nov. 16, 2010 to Nov. 18, 2010
ISBN: 978-0-7695-4221-8
pp: 25-30
ABSTRACT
Arbitrary orientation and sparse data content are common characteristics of torn document. To ensure accuracy and reliability in computer-based analysis, content-zone segmentation is required. In our previous work, we studied segmentation of handwritten and printed text. A questioned document-piece in the form of an office note, however, might also contain non-text data like logos, graphics, and pictures. Hence a more precise content-zone classification is required. In this paper we propose a two-tier approach for non-text, handwriting and printed text segmentation. The first tier aims to discriminate text and non-text regions. The second tier classifies handwritten and printed text within all text zones identified during the first tier. Gabor features and chain-code features are used in Tier-1 and Tier-2, respectively. By using SVM classifier we successfully identified 97.65% of 31,227 text regions in our current test data. The proposed approach identified 98.69% of printed and 96.39% of handwritten text amongst all identified text regions.
INDEX TERMS
Torn Document Recognition, Text Classification, Printed and Handwritten Text Segmentation, Text Graphics Segmentation
CITATION

K. Franke, S. Chanda and U. Pal, "Document-Zone Classification in Torn Documents," Frontiers in Handwriting Recognition, International Conference on(ICFHR), Kolkata, India, 2010, pp. 25-30.
doi:10.1109/ICFHR.2010.12
87 ms
(Ver 3.3 (11022016))