Fourth International Conference Document Analysis and Recognition (ICDAR'97) Logical Structure Analysis of Book Document Images Using Contents Information Ulm, GERMANY August 18-August 20 ISBN: 0-8186-7898-4
Numerous studies have so far been carried out extensively for the analysis of document image structure with particular emphasis placed on media conversion and layout analysis. For the conversion of a collection of books in a library to the form of hypertext documents, the logical structure extraction technology is indispensable in addition to document layout analysis. The table contents of a book generally involves very concise and faithful information to represent the logical structure of the entire book document. That is to say, we can efficiently analyze the logical structure of a book by making full use of its contents pages. This paper is intended to propose a new approach for document logical structure analysis to convert document images and contents information into an electronic document. First, the contents page of a book are analyzed to acquire overall document logical structure. Thereafter, we are able to use this information to acquire the logical structure of the whole pages of the book by analyzing consecutive pages of a portion of the book. The test results demonstrate very high discrimination rates: up to 97.6% for the headline structure, 99.4% for the text structure, 97.8%, for the page number structure and almost 100% for the head-foot structure.
Citation:
ChunChen Lin, Yosihiro Niwa, Seinosuke Narita, "Logical Structure Analysis of Book Document Images Using Contents Information," icdar, pp.1048, Fourth International Conference Document Analysis and Recognition (ICDAR'97), 1997 Usage of this product signifies your acceptance of the Terms of Use. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||