The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.01 - Jan. (2013 vol.25)
pp: 15-28
Jing Gao , University at Buffalo, The State University of New York, Buffalo
Feng Liang , University of Illinois at Urbana-Champaign, Urbana
Wei Fan , Huawei Noah's Ark Lab, Shatin
Yizhou Sun , University of Illinois at Urbana-Champaign, Urbana
Jiawei Han , University of Illinois at Urbana-Champaign, Urbana
ABSTRACT
Ensemble learning has emerged as a powerful method for combining multiple models. Well-known methods, such as bagging, boosting, and model averaging, have been shown to improve accuracy and robustness over single models. However, due to the high costs of manual labeling, it is hard to obtain sufficient and reliable labeled data for effective training. Meanwhile, lots of unlabeled data exist in these sources, and we can readily obtain multiple unsupervised models. Although unsupervised models do not directly generate a class label prediction for each object, they provide useful constraints on the joint predictions for a set of related objects. Therefore, incorporating these unsupervised models into the ensemble of supervised models can lead to better prediction performance. In this paper, we study ensemble learning with outputs from multiple supervised and unsupervised models, a topic where little work has been done. We propose to consolidate a classification solution by maximizing the consensus among both supervised predictions and unsupervised constraints. We cast this ensemble task as an optimization problem on a bipartite graph, where the objective function favors the smoothness of the predictions over the graph, but penalizes the deviations from the initial labeling provided by the supervised models. We solve this problem through iterative propagation of probability estimates among neighboring nodes and prove the optimality of the solution. The proposed method can be interpreted as conducting a constrained embedding in a transformed space, or a ranking on the graph. Experimental results on different applications with heterogeneous data sources demonstrate the benefits of the proposed method over existing alternatives. (More information, data, and code are available at http://www.cse.buffalo.edu/~jing/integrate.htm.)
INDEX TERMS
Motion pictures, Data models, Predictive models, Optimization, Bipartite graph, Labeling, Social network services, semi-supervised learning, Knowledge integration, ensemble learning, clustering ensemble
CITATION
Jing Gao, Feng Liang, Wei Fan, Yizhou Sun, Jiawei Han, "A Graph-Based Consensus Maximization Approach for Combining Multiple Supervised and Unsupervised Models", IEEE Transactions on Knowledge & Data Engineering, vol.25, no. 1, pp. 15-28, Jan. 2013, doi:10.1109/TKDE.2011.206
REFERENCES
[1] E. Bauer and R. Kohavi, "An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants," Machine Learning, vol. 36, pp. 105-139, 2004.
[2] D.P. Bertsekas, Non-Linear Programming, second ed. Athena Scientific, 1999.
[3] A. Blum and T. Mitchell, "Combining Labeled and Unlabeled Data with Co-Training," Proc. 11th Ann. Conf. Computational Learning Theory (COLT '98), pp. 92-100, 1998.
[4] N. Borlin, "Implementation of Hungarian Method," http://www.cs.umu.se/~niclas/matlabassignprob /, 2012.
[5] L. Breiman, "Bagging Predictors," Machine Learning, vol. 26, pp. 123-140, 1996.
[6] L. Breiman, "Random Forests," Machine Learning, vol. 45, no. 1, pp. 5-32, 2001.
[7] R. Caruana, "Multitask Learning," Machine Learning, vol. 28, no. 1, pp. 41-75, 1997.
[8] C.-C. Chang and C.-J. Lin, "Libsvm: A Library for Support Vector Machines," Software, http://www.csie.ntu.edu.tw/~cjlin libsvm , 2001.
[9] M. Collins and Y. Singer, "Unsupervised Models for Named Entity Classification," Proc. Int'l Conf. Empirical Methods in Natural Language Processing (EMNLP '99), 2007.
[10] W. Fan, E. Greengrass, J. McCloskey, P.S. Yu, and K. Drummey, "Effective Estimation of Posterior Probabilities: Explaining the Accuracy of Randomized Decision Tree Approaches," Proc. IEEE Int'l Conf. Data Mining (ICDM '05), pp. 154-161, 2005.
[11] X.Z. Fern and C.E. Brodley, "Solving Cluster Ensemble Problems by Bipartite Graph Partitioning," Proc. Int'l Conf. Machine Learning (ICML '04), pp. 281-288, 2004.
[12] Y. Freund and R.E. Schapire, "A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting," J. Computer and System Sciences, vol. 55, no. 1, pp. 119-139, 1997.
[13] J.H. Friedman and B.E. Popescu, "Predictive Learning via Rule Ensembles," Annals of Applied Statistics, vol. 3, no. 2, pp. 916-954, 2008.
[14] K. Ganchev, J. Graca, J. Blitzer, and B. Taskar, "Multi-View Learning over Structured and Non-Identical Outputs," Proc. Fourth Conf. Uncertainty in Artificial Intelligence (UAI '08), pp. 204-211, 2008.
[15] J. Gao, W. Fan, Y. Sun, and J. Han, "Heterogeneous Source Consensus Learning via Decision Propagation and Negotiation," Proc. ACM Int'l Conf. Knowledge Discovery and Data Mining (KDD '09), pp. 339-347, 2009.
[16] J. Gao, F. Liang, W. Fan, Y. Sun, and J. Han, "Graph-Based Consensus Maximization among Multiple Supervised and Unsupervised Models," Proc. Neural Information Processing Systems (NIPS '09), pp. 585-593, 2009.
[17] A. Genkin, D.D. Lewis, and D. Madigan, "Bbr: Bayesian Logistic Regression Software," http://stat.rutgers.edu/~madiganBBR/, 2012.
[18] A. Gionis, H. Mannila, and P. Tsaparas, "Clustering Aggregation," ACM Trans. Knowledge Discovery Data, vol. 1, no. 1,article 4, 2007.
[19] A. Goldberg and X. Zhu, "Seeing Stars when There Aren't Many Stars: Graph-Based Semi-Supervised Learning for Sentiment Categorization," Proc. First Workshop Graph Based Methods for Natural Language Processing, 2006.
[20] T. Haveliwala, "Topic-Sensitive Pagerank: A Context-Sensitive Ranking Algorithm for Web Search," IEEE Trans. Knowledge and Data Eng., vol. 15, no. 4, pp. 784-796, July/Aug. 2003.
[21] J. Hoeting, D. Madigan, A. Raftery, and C. Volinsky, "Bayesian Model Averaging: A Tutorial," Statistical Science, vol. 14, pp. 382-417, 1999.
[22] R. Jacobs, M. Jordan, S. Nowlan, and G. Hinton, "Adaptive Mixtures of Local Experts," Neural Computation, vol. 3, no. 1, pp. 79-87, 1991.
[23] T. Joachims, "Transductive Learning via Spectral Graph Partitioning," Proc. Int'l Conf. Machine Learning (ICML '03), pp. 290-297, 2003.
[24] G. Karypis, "Cluto - Family of Data Clustering Software Tools," http://glaros.dtc.umn.edu/gkhome/viewscluto , 2012.
[25] L.I. Kuncheva, Combining Pattern Classifiers: Methods and Algorithms. John Wiley & Sons, 2004.
[26] M. Lenzerini, "Data Integration: A Theoretical Perspective," Proc. Symp. Principles of Database Systems, pp. 233-246, 2002.
[27] T. Li, C. Ding, and M. Jordan, "Solving Consensus and Semi-Supervised Clustering Problems Using Nonnegative Matrix Factorization," Proc. Int'l Conf. Machine Learning (ICDM '07), pp. 577-582, 2007.
[28] B. Long, Z. Zhang, and P.S. Yu, "Combining Multiple Clusterings by Soft Correspondence," Proc. Int'l Conf. Machine Learning (ICDM '05), pp. 282-289, 2005.
[29] A. McCallum, K. Nigam, J. Rennie, and K. Seymore, "Automating the Construction of Internet Portals with Machine Learning," Information Retrieval J., vol. 3, pp. 127-163, 2000.
[30] L. Page, S. Brin, R. Motwani, and T. Winograd, "The Pagerank Citation Ranking: Bringing Order to the Web," technical report, Stanford InfoLab, 1999.
[31] W. Punch, A. Topchy, and A.K. Jain, "Clustering Ensembles: Models of Consensus and Weak Partitions," IEEE Trans. Pattern Analysis Machine Intelligence, vol. 27, no. 12, pp. 1866-1881, Dec. 2005.
[32] G. Seni and J. Elder, Ensemble Methods in Data Mining: Improving Accuracy through Combining Predictions. Morgan & Claypool, 2010.
[33] V. Sindhwani, P. Niyogi, and M. Belkin, "A Co-Regularization Approach to Semi-Supervised Learning with Multiple Views," Proc. ICML Workshop Learning with Multiple Views, 2005.
[34] V. Singh, L. Mukherjee, J. Peng, and J. Xu, "Ensemble Clustering Using Semidefinite Programming," Proc. Neural Information Processing Systems (NIPS '07), 2007.
[35] A. Strehl and J. Ghosh, "Cluster Ensembles — a Knowledge Reuse Framework for Combining Multiple Partitions," J. Machine Learning Research, vol. 3, pp. 583-617, 2003.
[36] H. Wang, H. Shan, and A. Banerjee, "Bayesian Cluster Ensembles," Proc. SIAM Int'l Conf. Data Mining (SDM '09), 2009.
[37] D.H. Wolpert, "Stacked Generalization," Neural Networks, vol. 5, pp. 241-259, 1992.
[38] D. Zhou, J. Weston, A. Gretton, O. Bousquet, and B. Schölkopf, "Ranking on Data Manifolds," Proc. Neural Information Processing Systems (NIPS '03), 2003.
[39] Z. Zhou, D. Zhan, and Q. Yang, "Semi-Supervised Learning with Very Few Labeled Training Examples," Proc. Nat'l Conf. Artificial Intelligence (AAAI '07), pp. 675-680, 2007.
[40] X. Zhu, "Semi-Supervised Learning Literature Survey," Technical Report 1530, Computer Sciences, Univ. of Wisconsin-Madison, 2005.
17 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool