2010 22nd IEEE International Conference on Tools with Artificial Intelligence (2010)
Arras, France
Oct. 27, 2010 to Oct. 29, 2010
ISSN: 1082-3409
ISBN: 978-0-7695-4263-8
pp: 345-346
In last years the huge relevance of accessing and acquiring information made available by Web pages and business documents has grown much further. Thus, wrapping information from documents in HTML and PDF formats is receiving increasing interest. In this paper we present a textual query language, named ViQueL, that allows for querying information in both Web and PDF documents on the base of its spatial arrangement. The proposed language is founded on spatial grammars, i.e. context free grammars extended by spatial constructs. The main feature of ViQueL is that it make possible to identify and extract relevant information from HTML and PDF documents on the base of their visual appearance by using easy-to-write queries. Despite a considerable expressive power, combined complexity of ViQueL is in P-Time. Moreover, experiments show that ViQueL is reasonably efficient for real life extraction tasks.
Information Extraction, Wrapping, Qualitative Spatial Reasoning, Context Free Grammars

