Ninth International Conference on Document Analysis and Recognition (ICDAR 2007) Vol 2
Example-Based Logical Labeling of Document Title Page Images
Curitiba, Parana, Brazil
September 23-September 26
ISBN: 0-7695-2822-8
D. Keysers, German Research Center for Artificial Intelligence (DFKI), Kaiserslautern, Germany
F. Shafait, German Research Center for Artificial Intelligence (DFKI), Kaiserslautern, Germany
This paper presents a flexible and effective example- based approach for labeling title pages which can be used for automated extraction of bibliographic data. The labels of interest are "Title", "Author", "Abstract" and "Affili- ation". The method takes a set of labeled document lay- outs and a single unlabeled document layout as input and finds the best matching layout in the set. The labels of this layout are used to label the new layout. The similarity measure for layouts combines structural layout similarity and textural similarity on the block-level. Experimental re- sults yield accuracy rates from 94.8% to 99.6% obtained on the publicly available MARG dataset. This shows that our lightweight method has equivalent and partially better per- formance when compared to other more complex labeling methods known from the literature.
Citation:
J. van Beusekom, D. Keysers, F. Shafait, T. Breuel, "Example-Based Logical Labeling of Document Title Page Images," icdar, vol. 2, pp.919-923, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007) Vol 2, 2007