Second International Conference on Document Image Analysis for Libraries (DIAL'06)
Tree clustering for layout-based document image retrieval
Lyon, France
April 27-April 28
ISBN: 0-7695-2531-8
We describe a system for the retrieval on the basis of layout similarity of document images belonging to collections stored in digital libraries. Layout regions are extracted and represented with the XY tree. The proposed indexing method combines a new tree clustering algorithm (based on Self Organizing Maps) with Principal Component Analysis. The combination of these techniques allows us to retrieve the most similar pages from large collections without the need for a direct comparison of the query page with each indexed document.
Citation:
Simone Marinai, Emanuele Marino, Giovanni Soda, "Tree clustering for layout-based document image retrieval," dial, pp.243-253, Second International Conference on Document Image Analysis for Libraries (DIAL'06), 2006