Issue No. 05 - May (2018 vol. 30)
Zhixu Li , School of Computer Science and Technology, Soochow University, Jiangsu Sheng, China
Ying He , School of Computer Science and Technology, Soochow University, Jiangsu Sheng, China
Binbin Gu , School of Computer Science and Technology, Soochow University, Jiangsu Sheng, China
An Liu , School of Computer Science and Technology, Soochow University, Jiangsu Sheng, China
Hongsong Li , Microsoft Research Asia, Beijing, China
Haixun Wang , Microsoft Research Asia, Beijing, China
Xiaofang Zhou , School of Information Technology and Electrical Engineering, The University of Queensland, Brisbane QLD, Australia
Semantic drift is a common problem in iterative information extraction. Previous approaches for minimizing semantic drift may incur substantial loss in recall. We observe that most semantic drifts are introduced by a small number of questionable extractions in the earlier rounds of iterations. These extractions subsequently introduce a large number of questionable results, which lead to the semantic drift phenomenon. We call these questionable extractions Drifting Points (DPs). If erroneous extractions are the “symptoms” of semantic drift, then DPs are the “causes” of semantic drift. In this paper, we propose a method to minimize semantic drift by identifying the DPs and removing the effect introduced by the DPs. We use isA (concept-instance) extraction as an example to describe our approach in cleaning information extraction errors caused by semantic drift, but we perform experiments on different relation extraction processes on three large real data extraction collections. The experimental results show that our DP cleaning method enables us to clean around 90 percent incorrect instances or patterns with about 90 percent precision, which outperforms the previous approaches we compare with.
Semantics, Dogs, Data mining, Syntactics, Feature extraction, Cats
Z. Li et al., "Diagnosing and Minimizing Semantic Drift in Iterative Bootstrapping Extraction," in IEEE Transactions on Knowledge & Data Engineering, vol. 30, no. 5, pp. 852-865, 2018.