1999 International Symposium on Database Applications in Non-Traditional Environments (DANTE'99) An Approximation Method for Extracting Typical Classes from Semistructured Data Kyoto, Japan November 28-November 30 ISBN: 0-7695-0496-5
We consider a class extraction problem over semistructured data. A class C is extracted by grouping objects having similar (not necessarily identical) sets of properties into C, where the set of properties of C is the union of those of the objects in C. Let C be an extracted class and o be an object in C. If C has property P but o has no property P value, then P is null within o. An extracted class C is called typical if the number of nulls in C is small against the number of objects in C and the number of properties of C. We present the following results. First, we prove that the problem of deciding if a typical class can be extracted from given semistructured data is NP-complete. Second, we present an approximation algorithm for extracting typical classes from given semistructured data. Finally, we briefly discuss a sufficient condition for the approximation algorithm to run efficiently.
Index Terms:
semistructured data, schema extraction, approximation algorithm
Citation:
Nobutaka Suzuki, Yoichirou Sato, Michiyoshi Hayase, "An Approximation Method for Extracting Typical Classes from Semistructured Data," dante, pp.197, 1999 International Symposium on Database Applications in Non-Traditional Environments (DANTE'99), 1999 Usage of this product signifies your acceptance of the Terms of Use. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||