loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Eighth International Conference on Document Analysis and Recognition (ICDAR'05)
Clustering document images using a bag of symbols representation
Seoul, Korea
August 31-September 01
ISBN: 0-7695-2420-6
Eugen Barbu, CNRS FRE 2645 - Universite de Rouen, France
Pierre Heroux, CNRS FRE 2645 - Universite de Rouen, France
Sebastien Adam, CNRS FRE 2645 - Universite de Rouen, France
Eric Trupin, CNRS FRE 2645 - Universite de Rouen, France

Document image classification is an important step in document image analysis. Based on classification results we can tackle other tasks such as indexation, understanding or navigation in document collections. Using a document representation and an unsupervised classification method, we may group documents that from the user point of view constitute valid clusters. The semantic gap between a domain independent document representation and the user implicit representation can lead to unsatisfactory results.

In this paper we describe document images based on frequent occurring symbols. This document description is created in an unsupervised manner and can be related to the domain knowledge. Using data mining techniques applied to a graph based document representation we find frequent and maximal subgraphs. For each document image, we construct a bag containing the frequent subgraphs found in it. This bag of "symbols" represents the description of a document. We present results obtained on a corpus of 60 graphical document images.We present results obtained on a corpus of 60 graphical document images.

Citation:
Eugen Barbu, Pierre Heroux, Sebastien Adam, Eric Trupin, "Clustering document images using a bag of symbols representation," icdar, pp.1216-1220, Eighth International Conference on Document Analysis and Recognition (ICDAR'05), 2005
Usage of this product signifies your acceptance of the Terms of Use.