2014 IEEE 30th International Conference on Data Engineering (ICDE) (2014)
Chicago, IL, USA
March 31, 2014 to April 4, 2014
Ziawasch Abedjan , Hasso Plattner Institute (HPI), Potsdam, Germany
Toni Gruetze , Hasso Plattner Institute (HPI), Potsdam, Germany
Anja Jentzsch , Hasso Plattner Institute (HPI), Potsdam, Germany
Felix Naumann , Hasso Plattner Institute (HPI), Potsdam, Germany
Before reaping the benefits of open data to add value to an organizations internal data, such new, external datasets must be analyzed and understood already at the basic level of data types, constraints, value patterns etc. Such data profiling, already difficult for large relational data sources, is even more challenging for RDF datasets, the preferred data model for linked open data. We present ProLod++, a novel tool for various profiling and mining tasks to understand and ultimately improve open RDF data. ProLod++ comprises various traditional data profiling tasks, adapted to the RDF data model. In addition, it features many specific profiling results for open data, such as schema discovery for user-generated attributes, association rule discovery to uncover synonymous predicates, and uniqueness discovery along ontology hierarchies. ProLod++ is highly efficient, allowing interactive profiling for users interested in exploring the properties and structure of yet unknown datasets.
Resource description framework, Ontologies, Association rules, Data models, Data visualization, Pattern analysis
Z. Abedjan, T. Gruetze, A. Jentzsch and F. Naumann, "Profiling and mining RDF data with ProLOD++," 2014 IEEE 30th International Conference on Data Engineering (ICDE), Chicago, IL, USA, 2014, pp. 1198-1201.