2018 IEEE 34th International Conference on Data Engineering (ICDE) (2018)
Apr 16, 2018 to Apr 19, 2018
Entity categorization - the process of grouping entities into categories for some specific purpose - is an important problem with a great many applications, such as Google Scholar and Amazon products. Unfortunately, many real-world categories contain mis-categorized entities, such as publications in one's Google Scholar page that are published by the others. We have proposed a general framework for a new research problem - discovering mis-categorized entities. In this demonstration, we have developed a Google Chrome extension, namely GSCleaner, as one important application of our studied problem. The attendees will have the opportunity to experience the following features: (1) mis-categorized entity discovery - The attendee can check mis-categorized entities on anyone's Google Scholar page; and (2) Cleaning onsite - Any attendee can login and clean his Google Scholar page using GSCleaner.We describe our novel rule-based framework to discover mis-categorized entities. We also propose effective optimization techniques to apply the rules. Some empirical results show the effectiveness of GSCleaner on discovering mis-categorized entities.
data mining, optimisation, search engines, Web sites
S. Hao, Y. Xu, N. Tang, G. Li and J. Feng, "Cleaning Your Wrong Google Scholar Entries," 2018 IEEE 34th International Conference on Data Engineering (ICDE), Paris, France, 2018, pp. 1597-1600.