The Community for Technology Leaders
Frontiers in Handwriting Recognition, International Conference on (2010)
Kolkata, India
Nov. 16, 2010 to Nov. 18, 2010
ISBN: 978-0-7695-4221-8
pp: 25-30
Arbitrary orientation and sparse data content are common characteristics of torn document. To ensure accuracy and reliability in computer-based analysis, content-zone segmentation is required. In our previous work, we studied segmentation of handwritten and printed text. A questioned document-piece in the form of an office note, however, might also contain non-text data like logos, graphics, and pictures. Hence a more precise content-zone classification is required. In this paper we propose a two-tier approach for non-text, handwriting and printed text segmentation. The first tier aims to discriminate text and non-text regions. The second tier classifies handwritten and printed text within all text zones identified during the first tier. Gabor features and chain-code features are used in Tier-1 and Tier-2, respectively. By using SVM classifier we successfully identified 97.65% of 31,227 text regions in our current test data. The proposed approach identified 98.69% of printed and 96.39% of handwritten text amongst all identified text regions.
Torn Document Recognition, Text Classification, Printed and Handwritten Text Segmentation, Text Graphics Segmentation

K. Franke, S. Chanda and U. Pal, "Document-Zone Classification in Torn Documents," Frontiers in Handwriting Recognition, International Conference on(ICFHR), Kolkata, India, 2010, pp. 25-30.
87 ms
(Ver 3.3 (11022016))