The Community for Technology Leaders
2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM) (2016)
San Francisco, CA, USA
Aug. 18, 2016 to Aug. 21, 2016
ISBN: 978-1-5090-2847-4
pp: 358-365
Xiaotao Gu , Department of Computer Science and Technology, Tsinghua University
Hong Yang , Department of Computer Science and Technology, Tsinghua University
Jie Tang , Department of Computer Science and Technology, Tsinghua University
Jing Zhang , Department of Computer Science and Technology, Tsinghua University
ABSTRACT
The study of Web user profiling can be traced back to 30 years ago, with the goal of extracting “semantic”-based user profile attributes from the unstructured Web. Despite slight differences, the general method is to first identify relevant pages of a specific user and then use machine learning models (e.g., CRFs) to extract the profile attributes from the page. However, with the rapid growth of the Web volume, such a method suffers from data redundancy and error propagation between the two steps. In this paper, we revisit the problem of Web user profiling in the big data era, trying to deal with the new challenges. We propose a simple but very effective approach for extracting user profile attributes from the Web using big data. To avoid error propagation, the approach processes all the extraction subtasks in one unified model. To further incorporate human knowledge to improve the extraction performance, we propose a Markov logic factor graph (MagicFG) model. The MagicFG model describes human knowledge as first-order logics and combines the logics into the extraction model. Our experiments on a real data set show that the proposed method significantly improves (+4–6%; p « 0.01, t-test) the extraction performance in comparison with several baseline methods.
INDEX TERMS
Electronic mail, Data mining, Redundancy, Feature extraction, Web pages, Google, Big data
CITATION

X. Gu, H. Yang, J. Tang and J. Zhang, "Web user profiling using data redundancy," 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), San Francisco, CA, USA, 2016, pp. 358-365.
doi:10.1109/ASONAM.2016.7752259
276 ms
(Ver 3.3 (11022016))