loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Eighth International Conference on Document Analysis and Recognition (ICDAR'05)
Document Ranking by Layout Relevance
Seoul, Korea
August 31-September 01
ISBN: 0-7695-2420-6
May Huang, University of Maryland
Daniel DeMenthon, University of Maryland
David Doermann, University of Maryland
Lynn Golebiowski, 134 National Business Parkway, Annapolis Junction,MD
Booz Allen Hamilton, 134 National Business Parkway, Annapolis Junction,MD
This paper describes the development of a new document ranking system based on layout similarity. The user has a need represented by a set of "wanted" documents, and the system ranks documents in the collection according to this need. Rather than performing complete document analysis, the system extracts text lines, and models layouts as relationships between pairs of these lines. This paper explores three novel feature sets to support scoring in large document collections. First, pairs of lines are used to form quadrilaterals, which are represented by their turning functions. A non- Euclidean distance is used to measure similarity. Second, the quadrilaterals are represented by 5D Euclidean vectors, and third, each line is represented by a 5D Euclidean vector. We compare the classification performance and computation speed of these three feature sets using a large database of diverse documents including forms, academic papers and handwritten pages in English and Arabic. The approach using quadrilaterals and turning functions produces slightly better results, but the approach using vectors to represent text lines is much faster for large document databases.
Citation:
May Huang, Daniel DeMenthon, David Doermann, Lynn Golebiowski, Booz Allen Hamilton, "Document Ranking by Layout Relevance," icdar, pp.362-366, Eighth International Conference on Document Analysis and Recognition (ICDAR'05), 2005
Usage of this product signifies your acceptance of the Terms of Use.