The Community for Technology Leaders
RSS Icon
Subscribe
Atlanta, Georgia
Apr. 3, 2006 to Apr. 7, 2006
ISBN: 0-7695-2571-7
pp: 81
Jan Hegewald , Humboldt-Universitat zu Berlin
Felix Naumann , Humboldt-Universitat zu Berlin
Melanie Weis , Humboldt-Universitat zu Berlin
ABSTRACT
XML is the de facto standard format for data exchange on the Web. While it is fairly simple to generate XML data, it is a complex task to design a schema and then guarantee that the generated data is valid according to that schema. As a consequence much XML data does not have a schema or is not accompanied by its schema. In order to gain the benefits of having a schema - efficient querying and storage of XML data, semantic verification, data integration, etc.- this schema must be extracted. <p>In this paper we present an automatic technique, XStruct, for XML Schema extraction. Based on ideas of [5], XStruct extracts a schema for XML data by applying several heuristics to deduce regular expressions that are 1-unambiguous and describe each element?s contents correctly but generalized to a reasonable degree. Our approach features several advantages over known techniques: XStruct scales to very large documents (beyond 1GB) both in time and memory consumption; it is able to extract a general, complete, correct, minimal, and understandable schema for multiple documents; it detects datatypes and attributes. Experiments confirm these features and properties.</p>
INDEX TERMS
null
CITATION
Jan Hegewald, Felix Naumann, Melanie Weis, "XStruct: Efficient Schema Extraction from Multiple and Large XML Documents", ICDEW, 2006, 2013 IEEE 29th International Conference on Data Engineering Workshops (ICDEW), 2013 IEEE 29th International Conference on Data Engineering Workshops (ICDEW) 2006, pp. 81, doi:10.1109/ICDEW.2006.166
37 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool