The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.05 - May (2013 vol.25)
pp: 1001-1014
Ziyu Guan , University of California, Santa Barbara, Santa Barbara
Gengxin Miao , University of California, Santa Barbara, Santa Barbara
Russell McLoughlin , Lawrence Livermore National Laboratory, Livermore
Xifeng Yan , University of California, Santa Barbara, Santa Barbara
Deng Cai , Zhejiang Univerisity, Hangzhou
ABSTRACT
Expert search has been studied in different contexts, e.g., enterprises, academic communities. We examine a general expert search problem: searching experts on the web, where millions of webpages and thousands of names are considered. It has mainly two challenging issues: 1) webpages could be of varying quality and full of noises; 2) The expertise evidences scattered in webpages are usually vague and ambiguous. We propose to leverage the large amount of co-occurrence information to assess relevance and reputation of a person name for a query topic. The co-occurrence structure is modeled using a hypergraph, on which a heat diffusion based ranking algorithm is proposed. Query keywords are regarded as heat sources, and a person name which has strong connection with the query (i.e., frequently co-occur with query keywords and co-occur with other names related to query keywords) will receive most of the heat, thus being ranked high. Experiments on the ClueWeb09 web collection show that our algorithm is effective for retrieving experts and outperforms baseline algorithms significantly. This work would be regarded as one step toward addressing the more general entity search problem without sophisticated NLP techniques.
INDEX TERMS
Web pages, Search problems, Noise, Computational modeling, Space heating, Conductivity, diffusion, Expert search, web mining, hypergraph
CITATION
Ziyu Guan, Gengxin Miao, Russell McLoughlin, Xifeng Yan, Deng Cai, "Co-Occurrence-Based Diffusion for Expert Search on the Web", IEEE Transactions on Knowledge & Data Engineering, vol.25, no. 5, pp. 1001-1014, May 2013, doi:10.1109/TKDE.2012.49
REFERENCES
[1] J. Artiles, J. Gonzalo, and S. Sekine, "Weps 2 Evaluation Campaign: Overview of the Web People Search Clustering Task," Proc. Second Web People Search Evaluation Workshop (WePS '09), 2009.
[2] K. Balog, L. Azzopardi, and M. de Rijke, "Formal Models for Expert Finding in Enterprise Corpora," Proc. 29th Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, pp. 43-50, 2006.
[3] K. Balog, L. Azzopardi, and M. de Rijke, "A Language Modeling Framework for Expert Finding," Information Processing & Management, vol. 45, no. 1, pp. 1-19, 2009.
[4] K. Balog, T. Bogers, L. Azzopardi, M. de Rijke, and A. van den Bosch, "Broad Expertise Retrieval in Sparse Data Environments," Proc. 30th Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, pp. 551-558, 2007.
[5] K. Balog and M. de Rijke, "Finding Similar Experts," Proc. Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, pp. 821-822, 2007.
[6] K. Balog and M. De Rijke, "Associating People and Documents," Proc. IR Research, 30th European Conf. Advances in Information Retrieval (ECIR), pp. 296-308, 2008.
[7] K. Balog and M. de Rijke, "Combining Candidate and Document Models for Expert Search," Proc. 17th Text Retrieval Conf. (TREC), 2008.
[8] K. Balog and M. de Rijke, "Non-Local Evidence for Expert Finding," Proc. 17th ACM Conf. Information and Knowledge Management (CIKM), pp. 489-498, 2008.
[9] K. Balog, P. Thomas, N. Craswell, I. Soboroff, P. Bailey, and A.P. de Vries, "Overview of the Trec 2008 Enterprise Track," Proc. Text Retrieval Conf. (TREC), 2008.
[10] H. Bao and E.Y. Chang, "Adheat: An Influence-Based Diffusion Model for Propagating Hints to Match Ads," Proc. Int'l Conf. World Wide Web (WWW), pp. 71-80, 2010.
[11] R. Bekkerman and A. McCallum, "Disambiguating web Appearances of People in a Social Network," Proc. Int'l Conf. World Wide Web (WWW), pp. 463-470, 2005.
[12] M. Belkin and P. Niyogi, "Laplacian Eigenmaps for Dimensionality Reduction and Data Representation," Neural Computation, vol. 15, no. 6, pp. 1373-1396, 2003.
[13] P.R. Carlile, "Working Knowledge: How Organizations Manage What They Know," Human Resource Planning, vol. 21, no. 4, pp. 58-60, 1998.
[14] N. Craswell, A.P. de Vries, and I. Soboroff, "Overview of the Trec 2005 Enterprise Track," Proc. Text Retrieval Conf. (TREC), 2005.
[15] N. Craswell, D. Hawking, A.M. Vercoustre, and P. Wilkins, "P@noptic Expert: Searching for Experts not Just for Documents," Proc. Ausweb Poster, 2001.
[16] H. Deng, I. King, and M.R. Lyu, "Enhancing Expertise Retrieval using Community-Aware Strategies," Proc. ACM Conf. Information and Knowledge Management (CIKM), pp. 1733-1736, 2009.
[17] H. Deng, I. King, and M.R. Lyu, "Formal Models for Expert Finding on DBLP Bibliography Data," Proc. IEEE Int'l Conf. Data Mining (ICDM), pp. 163-172, 2009.
[18] Y. Fang, L. Si, and A.P. Mathur, "Discriminative Models of Integrating Document Evidence and Document-Candidate Associations for Expert Search," Proc. 33rd Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, pp. 683-690, 2010.
[19] Y. Fu, W. Yu, Y. Li, Y. Liu, M. Zhang, and S. Ma, "THUIR at Trec 2005: Enterprise Track," Proc. Text Retrieval Conf. (TREC), 2005.
[20] D. Horowitz and S.D. Kamvar, "The Anatomy of a Large-Scale Social Search Engine," Proc. Int'l Conf. World Wide Web (WWW), pp. 431-440, 2010.
[21] M. Karimzadehgan and C. Zhai, "Constrained Multi-Aspect Expertise Matching for Committee Review Assignment," Proc. ACM Conf. Information and Knowledge Management (CIKM), pp. 1697-1700, 2009.
[22] M. Karimzadehgan, C. Zhai, and G. Belford, "Multi-Aspect Expertise Matching for Review Assignment," Proc. ACM Conf. Information and Knowledge Management (CIKM), pp. 1113-1122, 2008.
[23] R.I. Kondor and J. Lafferty, "Diffusion Kernels on Graphs and Other Discrete Input Spaces," Proc. 19th Int'l Conf. Machine Learning (ICML), pp. 315-322, 2002.
[24] X. Liu, W.B. Croft, and M. Koll, "Finding Experts in Community-Based Question-Answering Services," Proc. ACM Conf. Information and Knowledge Management (CIKM), pp. 315-316, 2005.
[25] X. Liu, Z. Nie, N. Yu, and J.R. Wen, "Biosnowball: Automated Population of Wikis," Proc. 16th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 969-978, 2010.
[26] H. Ma, H. Yang, M.R. Lyu, and I. King, "Mining Social Networks Using Heat Diffusion Processes for Marketing Candidates Selection," Proc. ACM Conf. Information and Knowledge Management (CIKM), pp. 233-242, 2008.
[27] C. Macdonald and I. Ounis, "Voting for Candidates: Adapting Data Fusion Techniques for an Expert Search Task," Proc. ACM Conf. Information and Knowledge Management (CIKM), pp. 387-396, 2006.
[28] C. Macdonald and I. Ounis, "Expertise Drift and Query Expansion in Expert Search," Proc. ACM Conf. Information and Knowledge Management (CIKM), pp. 341-350, 2007.
[29] D. Mimno and A. McCallum, "Expertise Modeling for Matching Papers with Reviewers," Proc. 13th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 500-509, 2007.
[30] A. Ntoulas, M. Najork, M. Manasse, and D. Fetterly, "Detecting Spam Web Pages through Content Analysis," Proc. Int'l Conf. World Wide Web (WWW), 2006.
[31] P. Serdyukov and D. Hiemstra, "Being Omnipresent to be Almighty: The Importance of the Global Web Evidence for Organizational Expert Finding," Proc. SIGIR Workshop Future Challenges in Expertise Retrieval (fCHER), pp. 17-24, 2008.
[32] P. Serdyukov, H. Rode, and D. Hiemstra, "Modeling Multi-Step Relevance Propagation for Expert Finding," Proc. ACM Conf. Information and Knowledge Management (CIKM), pp. 1133-1142, 2008.
[33] J. Tang, A. Fong, B. Wang, and J. Zhang, "A Unified Probabilistic Framework for Name Disambiguation in Digital Library," IEEE Trans. Knowledge and Data Eng., vol. 24, no. 6, pp. 975-987, June 2012.
[34] C. Yang, Y. Cao, Z. Nie, J. Zhou, and J.R. Wen, "Closing the Loop in Webpage Understanding," IEEE Trans. Knowledge and Data Eng., vol. 22, no. 5, pp. 639-650, May 2010.
[35] H. Yang, I. King, and M.R. Lyu, "Diffusionrank: A Possible Penicillin for Web Spamming," Proc. Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, pp. 431-438, 2007.
[36] D. Yimam-Seid and A. Kobsa, "Expert-Finding Systems for Organizations: Problem and Domain Analysis and the Demoir Approach," J. Organizational Computing and Electronic Commerce, vol. 13, no. 1, pp. 1-24, 2003.
[37] M. Yoshida, M. Ikeda, S. Ono, I. Sato, and H. Nakagawa, "Person Name Disambiguation by Bootstrapping," Proc. Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, pp. 10-17, 2010.
[38] R. Yuster and U. Zwick, "Fast Sparse Matrix Multiplication," ACM Trans. Algorithms, vol. 1, no. 1, pp. 2-13, 2005.
[39] J. Zhang, M.S. Ackerman, and L. Adamic, "Expertise Networks in Online Communities: Structure and Algorithms," Proc. Int'l Conf. World Wide Web (WWW), pp. 221-230, 2007.
[40] D. Zhou, S. Orshanskiy, H. Zha, and C. Giles, "Co-Ranking Authors and Documents in a Heterogeneous Network," Proc. Int'l Conf. Data Mining (ICDM), pp. 739-744, 2007.
[41] J. Zhu, X. Huang, D. Song, and S. Rüger, "Integrating Multiple Document Features in Language Models for Expert Finding," Knowledge and Information Systems, vol. 23, no. 1, pp. 29-54, 2010.
[42] J. Zhu, Z. Nie, X. Liu, B. Zhang, and J.R. Wen, "Statsnowball: A Statistical Approach to Extracting Entity Relationships," Proc. Int'l Conf. World Wide Web (WWW), pp. 101-110, 2009.
10 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool