loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
International Conference on Semantic Computing (ICSC 2007)
Intelligent Parsing of Scanned Volumes for Web Based Archives
Irvine, California
September 17-September 19
ISBN: 0-7695-2997-6
Xiaonan Lu, The Pennsylvania State University, USA
James Z. Wang, The Pennsylvania State University, USA
C. Lee Giles, The Pennsylvania State University, USA
The proliferation of digital libraries and the large amount of existing documents raise important issues in efficient handling of documents. Printed texts in documents need to be converted into digital format and semantic information need to be parsed and managed for effective retrieval. In this work, we attempt to solve the problems faced by current web based archives, where large scale repositories of electronic resources have been built from scanned volumes. Specifically, we focus on the scientific domain and target scanned volumes of scientific publications. Our goal is to automate the semantic processing of scanned volumes, an important and challenging step towards efficient retrieval of content within scanned volumes. We tackle the problem by designing a machine learning-based method to extract multi-level metadata about content of scanned volumes. We combine image and text information within scanned volumes for intelligent parsing. We developed a system and test it with real world data from the Internet Archive, and the experimental evaluation has demonstrated good results.
Citation:
Xiaonan Lu, James Z. Wang, C. Lee Giles, "Intelligent Parsing of Scanned Volumes for Web Based Archives," icsc, pp.559-568, International Conference on Semantic Computing (ICSC 2007), 2007
Usage of this product signifies your acceptance of the Terms of Use.