The Community for Technology Leaders
2018 IEEE 34th International Conference on Data Engineering (ICDE) (2018)
Paris, France
Apr 16, 2018 to Apr 19, 2018
ISSN: 2375-026X
ISBN: 978-1-5386-5520-7
pp: 413-424
ABSTRACT
Entity categorization - the process of grouping entities into categories for some specific purpose - is an important problem with a great many applications, such as Google Scholar and Amazon products. Unfortunately, in practice, many entities are mis-categorized. In this paper, we study the problem of discovering mis-categorized entities from a given group of entities. This problem is inherently hard: all entities within the same group have been "well" categorized by state-of-the-art solutions. Apparently, it is nontrivial to differentiate them. We propose a novel rule-based framework to solve this problem. It first uses positive rules to compute disjoint partitions of entities, where the partition with the largest size is taken as the correctly categorized partition, namely the pivot partition. It then uses negative rules to identify mis-categorized entities in other partitions that are dissimilar to the entities in the pivot partition. We describe optimizations on applying these rules, and discuss how to generate positive/negative rules. Extensive experimental results on two real-world datasets show the effectiveness of our solution.
INDEX TERMS
data mining, knowledge based systems
CITATION

S. Hao, N. Tang, G. Li and J. Feng, "Discovering Mis-Categorized Entities," 2018 IEEE 34th International Conference on Data Engineering (ICDE), Paris, France, 2018, pp. 413-424.
doi:10.1109/ICDE.2018.00045
790 ms
(Ver 3.3 (11022016))