loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
1996 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'96)
A fast and flexible statistical method for text extraction in document pages
San Francisco, Ca.
June 18-June 20
ISBN: 0-8186-7258-7
This paper describes a fast and flexible method for extracting text regions from a document page containing text, graphics, and pictures. Such regions can be given as an input to an OCR system. The user fixes two parameters, the minimum width w of the text to be detected, and the precision \epsilon needed (both expressed as a percentage of the image width), according to the implementation needs. The method works by subdividing the page into overlapping columns whose width and inter-shift depend on w and \epsilon, and by performing text lines extraction on each column separately. Successively, a statistical analysis of the text line elements found in each column is performed, and they are connected to form complete text lines. Finally, related pieces of text are merged into blocks so that a sensible reading order is provided for the OCR system. The algorithm is very fast, is able to work on low-resolution document pages and is robust against skew. The algorithm is also very flexible: no assumptions are made on the layout of the document, the shape of the text regions, and the font size and style; the main assumption is that the background is uniform and the text approximately horizontal. Despite the statistical nature of the method, a single line of text of a certain font size is generally sufficient to warrant detection. Experimental results are shown which demonstrate the effectiveness of the method on several different kinds of documents.
Citation:
Pietro Parodi, Giulia Piccioli, "A fast and flexible statistical method for text extraction in document pages," cvpr, pp.619, 1996 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'96), 1996
Usage of this product signifies your acceptance of the Terms of Use.