Issue No. 05 - May (2017 vol. 29)
Rui Meng , Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Kowloon, Hong Kong, SAR China
Lei Chen , Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Kowloon, Hong Kong, SAR China
Yongxin Tong , State Key Laboratory of Software Development Environment, School of Computer Science and Engineering, Beihang University, Beijing, China
Chen Zhang , Department of Computer Science and EngineeringHong Kong University of Science and Technology and Shandong University of Finance and Economics
The semantic web has enabled the creation of a growing number of knowledge bases (KBs), which are designed independently using different techniques. Integration of KBs has attracted much attention as different KBs usually contain overlapping and complementary information. Automatic techniques for KB integration have been improved but far from perfect. Therefore, in this paper, we study the problem of knowledge base semantic integration using crowd intelligence. There are both classes and instances in a KB, in our work, we propose a novel hybrid framework for KB semantic integration considering the semantic heterogeneity of KB class structures. We first perform semantic integration of the class structures via crowdsourcing, then apply the blocking-based instance matching approach according to the integrated class structure. For class structure (taxonomy) semantic integration, the crowd is leveraged to help identifying the
semantic relationships between classes to handle the semantic heterogeneity problem. Under the conditions of both large scale KBs and limited monetary budget for crowdsourcing, we formalize the class structure (taxonomy) semantic integration problem as a Local Tree Based Query Selection (LTQS) problem. We show that the LTQS problem is NP-hard and propose two greedy-based algorithms, i.e., static query selection and adaptive query selection. Furthermore, the KBs are usually of large scales and have millions of instances, direct pairwise-based instance matching is inefficient. Therefore, we adopt the blocking-based strategy for instance matching, taking advantage of the class structure (taxonomy) integration result. The experiments on real large scale KBs verify the effectiveness and efficiency of the proposed approaches.
Taxonomy, Semantics, Knowledge based systems, Ontologies, Crowdsourcing, Data integration, Computer science
R. Meng, L. Chen, Y. Tong and C. Zhang, "Knowledge Base Semantic Integration Using Crowdsourcing," in IEEE Transactions on Knowledge & Data Engineering, vol. 29, no. 5, pp. 1087-1100, 2017.