This Article 
 Bibliographic References 
 Add to: 
Learning Contextual Dependency Network Models for Link-Based Classification
November 2006 (vol. 18 no. 11)
pp. 1482-1496
Links among objects contain rich semantics that can be very helpful in classifying the objects. However, many irrelevant links can be found in real-world link data such as Web pages. Often, these noisy and irrelevant links do not provide useful and predictive information for categorization. It is thus important to automatically identify which links are most relevant for categorization. In this paper, we present a contextual dependency network (CDN) model for classifying linked objects in the presence of noisy and irrelevant links. The CDN model makes use of a dependency function that characterizes the contextual dependencies among linked objects. In this way, CDNs can differentiate the impacts of the related objects on the classification and consequently reduce the effect of irrelevant links on the classification. We show how to learn the CDN model effectively and how to use the Gibbs inference framework over the learned model for collective classification of multiple linked objects. The experiments show that the CDN model demonstrates relatively high robustness on data sets containing irrelevant links.

[1] L. Getoor, “Link Mining: A New Data Mining Challenge,” ACM SIGKDD Explorations Newsletter, vol. 5, no. 1, pp. 84-89, 2003.
[2] Y. Yang, S. Slattery, and R. Ghani, “A Study of Approaches to Hypertext Categorization,” J. Intelligent Information System, vol. 18, nos. 2/3, pp. 219-241, 2002.
[3] L. Getoor, E. Segal, B. Taskar, and D. Koller, “Probabilistic Models of Text and Link Structure for Hypertext Classification,” Proc. 17th Int'l Joint Conf. Artificial Intelligence Workshop Text Learning: Beyond Supervision, pp. 24-29, 2001.
[4] S. Chakrabarti, B. Dom, and P. Indyk, “Enhanced Hypertext Categorization Using Hyperlinks,” Proc. ACM SIGMOD '98, L.M.Haas and A. Tiwary, eds., pp. 307-318, 1998.
[5] J. Neville, D. Jensen, L. Friedland, and M. Hay, “Learning Relational Probability Trees,” Proc. Ninth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 625-630, 2003.
[6] Q. Lu and L. Getoor, “Link-Based Classification,” Proc. 12th Int'l Conf. Machine Learning, pp. 496-503, 2003.
[7] N. Friedman, D. Koller, and B. Taskar, “Learning Probabilistic Models of Relational Structure,” J. Machine Learning Research, pp.679-707, 2002.
[8] B. Taskar, P. Abbeel, and D. Koller, “Discriminative Probabilistic Models for Relational Classification,” Proc. Uncertainty on Artificial Intelligence, pp. 485-492, 2001.
[9] J. Neville and D. Jensen, “Collective Classification with Relational Dependency Networks,” Proc. Second Multi-Relational Data Mining Workshop KDD-2003, pp. 77-91, 2003.
[10] J. Neville and D. Jensen, “Dependency Networks for Relational Data,” Proc. IEEE Int'l Conf. Data Mining, pp. 170-177, 2004.
[11] D. Jensen and J. Neville, “Linkage and Autocorrelation Cause Feature Selection Bias in Relational Learning,” Proc. Ninth Int'l Conf. Machine Learning, pp. 259-266, 2002.
[12] M. Richardson and P. Domingos, “Markov Logic Networks,” Machine Learning, vol. 62, nos. 1-2, pp. 107-136, 2005.
[13] D. Heckerman, D. Chickering, C. Meek, R. Rounthwaite, and C. Kadie, “Dependency Networks for Inference, Collaborative Filtering, and Data Visualization,” J. Machine Learning Research, vol. 1, pp. 49-75, 2001.
[14] P. Brézillon, “Context in Problem Solving: A Survey,” The Knowledge Eng. Rev., vol. 14, no. 1, pp. 1-34, 1999.
[15] S. Zhong, J. Ghosh, “A New Formulation of Coupled Hidden Markov Models,” technical report, Dept. of Electrical and Computer Eng., Univ. of Texas at Austin, 2001.
[16] A. McCallum, K. Nigam, J. Rennie, and K. Seymore, “Automating the Construction of Internet Portals with Machine Learning,” Information Retrieval J., vol. 3, pp. 127-163, 2000.
[17] M. Craven, D. DiPasquo, D. Freitag, A. McCallum, T. Mitechell, K. Nigam, and S. Slattery, “Learning to Extract Symbolic Knowledge from the World Wide Web,” Proc. 15th Nat'l Conf. Artificial Intelligence, pp. 509-516, 1998.
[18] Y.H. Tian, T.J. Huang, and W. Gao, “Latent Linkage Semantic Kernels for Collective Classification of Link Data,” J. Intelligent Information Systems, 2006.
[19] D. Heckerman, C. Meek, and D. Koller, “Probabilistic Models for Relational Data,” Technical Report, MSR-TR-2004-30, Microsoft Research, 2004.
[20] W. Uwents and H. Blockeel, “Classifying Relational Data with Neural Networks,” Proc. 15th Int'l Conf. Inductive Logic Programming, S. Kramer and B. Pfahringer, eds., pp. 384-396, 2005.
[21] A. Popescul, L.H. Ungar, S. Lawrence, and D. M. Pennock, “Statistical Relational Learning for Document Mining,” Proc. IEEE Int'l Conf. Data Mining, pp. 275-282, 2003.
[22] S.A. Macskassy and F. Provost, “NetKit-SRL: A Toolkit for Network Learning and Inference,” Proc. Ann. Conf. North Am. Assoc. Computational Social and Organizational Science (NAACSOS), 2005.
[23] C. Boutilier, N. Friedman, M. Goldszmidt, and D. Koller, “Context-Specific Independence in Bayesian Networks,” Proc. 12th Conf. Uncertainty in Artificial Intelligence (UAI-96), E. Horvitz and F. Jensen, eds., pp. 115-123, 1996.
[24] M. Richardson and P. Domingos, “Mining Knowledge-Sharing Sites for Viral Marketing,” Proc. Eighth Int'l Conf. Knowledge Discovery and Data Mining, pp. 61-70, 2002.
[25] R. M. Gray, Entropy and Information Theory. New York: Springer-Verlag, 1990.
[26] S. Sanghai, P. Domingos, and D. Weld, “Dynamic Probabilistic Relational Models,” Proc. 18th Int'l Joint Conf. Artificial Intelligence, pp. 992-997, 2003.
[27] J. Neville, D. Jensen, and B. Gallagher, “Simple Estimators for Relational Bayesian Classifiers,” Proc. Third IEEE Int'l Conf. Data Mining, pp. 609-612, 2003.
[28] M.J. Fisher and R.M. Everson, “When Are Links Useful? Experiments in Text Classification,” Proc. Advances in Information Retrieval, 25th European Conf. IR Research, pp. 41-56, 2003.
[29] P. Sollich, “Probabilistic Methods for Support Vector Machines,” Proc. Advances in Neural Information Processing Systems, vol. 12, pp.349-355, 2000.
[30] D. Jensen, J. Neville, and B. Gallagher, “Why Collective Inference Improves Relational Classification,” Proc. 10th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 593-598, 2004.

Index Terms:
Data dependencies, hypertext/hypermedia, machine learning, link-based classification, link context, contextual dependency networks, Gibbs inference.
Yonghong Tian, Qiang Yang, Tiejun Huang, Charles X. Ling, Wen Gao, "Learning Contextual Dependency Network Models for Link-Based Classification," IEEE Transactions on Knowledge and Data Engineering, vol. 18, no. 11, pp. 1482-1496, Nov. 2006, doi:10.1109/TKDE.2006.178
Usage of this product signifies your acceptance of the Terms of Use.