loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
16th International Conference on Pattern Recognition (ICPR'02) - Volume 3
Unsupervised Clustering of Text Entities in Heterogeneous Grey Level Documents
Quebec City, QC, Canada
August 11-August 15
ISBN: 0-7695-1695-X
Stéphane Bres, INSA de Lyon
Véronique Eglin, INSA de Lyon
Antoine Gagneux, INSA de Lyon
This paper presents a new method of functional classification of text blocks on a document. It is based on texture analysis and unsupervised classification. Texture is used here to define different classes of text blocks in the document and to direct a possible way of exploration from the most eye-catching data to the less significant text block. The typographicaI properties of blocks are characterized by two main discriminating primitives: the complexity of the text draw ing and the structural relief of the block. This analysis is the starting point of a hree-classes categorization into functional families (main headings, sub-headings and text paragraphs). Each block of text is described and classified through a labeling process based on a 3D-feature space using the two previous features (complexity and structural relief) and a third one among pattern primitives, blocks size and location in the document. This method allows a first approach to a global context-free classification of documents.
Citation:
Stéphane Bres, Véronique Eglin, Antoine Gagneux, "Unsupervised Clustering of Text Entities in Heterogeneous Grey Level Documents," icpr, vol. 3, pp.30224, 16th International Conference on Pattern Recognition (ICPR'02) - Volume 3, 2002
Usage of this product signifies your acceptance of the Terms of Use.